DATA MINING MODEL CLASSIFICATION USING ALGORITHM K-NEAREST NEIGHBOR WITH NORMALIZATION FOR DIABETES PREDICTION
https://doi.org/10.36342/teika.v12i02.2911
Keywords:
Model, data mining, normalization, kNNAbstract
The model built using the data mining process can be used to make predictions from the data. The model can be built using a datasheet that contains data that is processed from the process. One implementation of the model in data mining is the prediction of a disease such as diabetes. In this study, a data mining model was developed using the k-NN algorithm and data normalization was carried out. The normalization method used is Z-Score and Min-Max. The research methodology is carried out by first determining the datasheet, selecting the data mining model and dividing the datasheet into datasheets into training data and data testing and evaluating the performance of the model created. The process of modeling using python programming. The data mining process uses a classification model using the k-NN algorithm. The datasheet used is a public datasheet, namely the diabetes datasheet which consists of 768 records and 8 attributes. The results of this modeling show that the normalization process can provide better accuracy values. The model developed without normalization produces a value of k=5 with an accuracy of 70%, normalization with the Z-Score method produces a value of k=21 with an accuracy of 72%, normalization with Min Max produces a value of k=3 with an accuracy of 74%. The recommended model is k-NN mode with a value of k=3.
Downloads
References
H. Tandra, Penderita Diabetes Boleh Makan Apa Saja. Jakarta: Gramedia Pustaka Utama, 2021.
V. Tjahjadi, Mengenal, Mencegah, Mengatasi Silent Killer, “Diabetes.” Jakarta: Hikam Pustaka, 2017.
D. W. Hestiana, “FAKTOR-FAKTOR YANG BERHUBUNGAN DENGAN KEPATUHAN DALAM PENGELOLAAN DIET PADA PASIEN RAWAT JALAN DIABETES MELLITUS TIPE 2 DI KOTA SEMARANG,” Jurnal of Health Education, vol. 2, no. 2, pp. 138–145, 2017.
Z. M. Syahid, “Literature Review Faktor yang Berhubungan dengan Kepatuhan Pengobatan Diabetes Mellitus,” JIKSH : Jurnal Ilmiah Kesehatan Sandi Husada, vol. 10, no. 1, pp. 147–155, 2021.
I. Istianah, Septiani, and G. K. Dewi, “Mengidentifikasi Faktor Gizi pada Pasien Diabetes Mellitus Tipe 2 di Kota Depok Tahun 2019,” Jurnal Kesehatan Indonesia (The Indonesian Journal of Health), vol. X, no. 2, pp. 72–78, 2020.
M. Shouman, T. Turner, and R. Stocker, “Applying k-Nearest Neighbour in Diagnosing Heart Disease Patients,” Applying k-Nearest Neighbour in Diagnosing Heart Disease Patients, vol. 2, no. 3, pp. 220–223, 2012.
S. Wiyono and T. Abidin, “Implementation of K-Nearest Neighbour (Knn) Algorithm To Predict Student’S Performance,” Simetris: Jurnal Teknik Mesin, Elektro dan Ilmu Komputer, vol. 9, no. 2, pp. 873–878, 2018, doi: 10.24176/simet.v9i2.2424.
S. A. D. Alalwan, “Diabetic analytics: Proposed conceptual data mining approaches in type 2 diabetes dataset,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 1, pp. 85–95, 2019, doi: 10.11591/ijeecs.v14.i1.pp88-95.
O. Llaha and A. Rista, “Prediction and detection of diabetes using machine learning,” in CEUR Workshop Proceedings, 2021, vol. 2872, pp. 94–102.
A. Azrar, M. Awais, Y. Ali, and K. Zaheer, “Data mining models comparison for diabetes prediction,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 8, pp. 320–323, 2018, doi: 10.14569/ijacsa.2018.090841.
D. Cielen, A. D. B. Meysman, and M. Ali, Introducing Data Science. 2016.
M. Arhami and M. Nasir, Data Mining - Algoritma dan Implementasi. Yogyakarta: Penerbit Andi, 2020.
D. Jollyta, W. Ramdhan, and M. Zarlis, Konsep Data Mining Dan Penerapan. Yogyakarta: Deepublish Publisher, 2020.
A. Wanto et al., Data Mining : Algoritma dan Implementasi. Medan: Yayasan Kita Menulis, 2020.
Suyanto, Data Mining untuk Klasifikasi dan Klasterisasi Data. Bandung: Informatika, 2017.
S. Novita, P. Harsani, and A. Qurania, “Penerapan K-Nearest Neighbor ( KNN ) untuk Klasifikasi Anggrek Berdasarkan Karakter Morfologi Daun dan Bunga,” KOMPUTASI, vol. 15, no. 1, pp. 118–125, 2018.
Y. Yahya and W. Puspita Hidayanti, “Penerapan Algoritma K-Nearest Neighbor Untuk Klasifikasi Efektivitas Penjualan Vape (Rokok Elektrik) pada ‘Lombok Vape On,’” Infotek : Jurnal Informatika dan Teknologi, vol. 3, no. 2, pp. 104–114, 2020, doi: 10.29408/jit.v3i2.2279.
N. Hidayati and A. Hermawan, “K-Nearest Neighbor (K-NN) algorithm with Euclidean and Manhattan in classification of student graduation,” Journal of Engineering and Applied Technology, vol. 2, no. 2, pp. 86–91, 2021, doi: 10.21831/jeatech.v2i2.42777.
P. Cunningham and S. J. Delany, “K-Nearest Neighbour Classifiers-A Tutorial,” ACM Computing Surveys, vol. 54, no. 6, 2021, doi: 10.1145/3459665.
B. Santosa and A. Umam, Buku Data Mining dan Big Data Analytics. Bantul: Penebar Media Pustaka, 2018.
M. Fhadli and F. Tempola, Data Mining dengan Python untuk Pemula. Bogor: Guepedia, 2020.
D. A. Nasution, H. H. Khotimah, and N. Chamidah, “Perbandingan Normalisasi Data untuk Klasifikasi Wine Menggunakan Algoritma K-NN,” Computer Engineering, Science and System Journal, vol. 4, no. 1, p. 78, 2019, doi: 10.24114/cess.v4i1.11458.
Ahmad Harmain, P. Paiman, H. Kurniawan, K. Kusrini, and Dina Maulina, “Normalisasi Data Untuk Efisiensi K-Means Pada Pengelompokan Wilayah Berpotensi Kebakaran Hutan Dan Lahan Berdasarkan Sebaran Titik Panas,” TEKNIMEDIA: Teknologi Informasi dan Multimedia, vol. 2, no. 2, pp. 83–89, 2022, doi: 10.46764/teknimedia.v2i2.49.
H. A. Prihanditya and A. Alamsyah, “The Implementation of Z-Score Normalization and Boosting Techniques to Increase Accuracy of C4.5 Algorithm in Diagnosing Chronic Kidney Disease,” Journal of Soft Computing Exploration, vol. 1, no. 1, pp. 63–69, 2020.
E. Alshdaifat, “The Impact of Data Normalization on Predicting Student Performance: A Case Study from Hashemite University,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 4, pp. 4580–4588, 2020, doi: 10.30534/ijatcse/2020/57942020.
Provost & Fawcett, “Data science-what you need to know about analytic-thinking and decision-making,” Journal of Chemical Information and Modeling, vol. 53, no. 9, pp. 1689–1699, 2013.
Jiawei Han and M. Kamber, Data Mining: Concepts and Techniques. 2019.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 TeIKa
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Share Alike Attribution Licence (CC-BY-SA What does this mean?). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.
By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.