Extraction of association rules in a diabetic dataset using parallel FP-growth algorithm under apache spark

Youssef Fakir; Salim Khalil; Mohamed Fakir

doi:10.11591/ijict.v13i3.pp445-452

Extraction of association rules in a diabetic dataset using parallel FP-growth algorithm under apache spark

Youssef Fakir, Salim Khalil, Mohamed Fakir

Abstract

This research paper focuses on enhancing the frequent pattern growth (FP-growth) algorithm, an advanced version of the Apriori algorithm, by employing a parallelization approach using the Apache Spark framework. Association rule mining, particularly in healthcare data for predicting and diagnosing diabetes, necessitates the handling of large datasets which traditional methods may not process efficiently. Our method improves the FP-growth algorithm’s scalability and processing efficiency by leveraging the distributed computing capabilities of apache spark. We conducted a comprehensive analysis of diabetes data, focusing on extracting frequent itemsets and association rules to predict diabetes onset. The results demonstrate that our parallelized FP-growth (PFP-growth) algorithm significantly enhances prediction accuracy and processing speed, offering substantial improvements over traditional methods. These findings provide valuable insights into disease progression and management, suggesting a scalable solution for large-scale data environments in healthcare analytics.

Keywords

Apache spark; Association rules; Diabetes prediction; FP-growth; Parallel FP-growth

Full Text:

PDF

DOI: http://doi.org/10.11591/ijict.v13i3.pp445-452

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSN 2722-2616
This journal is published by the Intelektual Pustaka Media Utama (IPMU).

View IJICT Stats

Username
Password
Remember me