Detection of myocardial infarction on recent dataset using machine learning

ABSTRACT


INTRODUCTION
The mortality rates of cancer and myocardial infarction (MI) are very high nowadays.MI is the clinical term describing a heart attack due to a lack of oxygenated blood to heart tissue due to a clogged artery.Patients who have survived an MI incident are at a greater risk of other heart-related health problems later in their lifetime.Amongst all harmful sicknesses, coronary heart attacks are taken into consideration as the most widely wide-spread.Medical practitioners' behavior so many surveys on heart sicknesses and accumulate records of coronary heart patients, their ailment development, and symptoms.Every year heart ailment reasons tens of millions of deaths globally.Many techniques and tools were developed for coronary heart disease prediction by using medical doctors.Researchers have made efforts to expand the automated diagnosis systems in order that accurate diagnosis ought to take place.Among these, the automated machine the usage of data mining and artificial intelligence (AI)-based totally approach is the recent one used in the automated prognosis.The motivation of the work is the lack of data available freely and really difficult to access patient's data from hospitals.Large datasets are required to find out the model accurately.It's also important to predict early MI to save lots of the lifetime of several.

21
In this research, the actual datasets are collected from the hospitals.This dataset is not sufficient to offer to the model.Providing limited information restricts the training of the model resulting in compromised results in terms of overfitting.To overcome this problem a new path is taken by creating a synthetic dataset to provide information in bulk to the model.For this, continuous discussions with expertise and rigorous study are done and a range of various parameters are calculated for early MI, MI, and non-MI.The datasets available on Kaggle are not recent and also it is not an Indian dataset.It is of utmost necessity to collect a recent dataset.Around 2149 patients' data is collected from three hospitals in pastoral areas of Nagpur.Machine learning models learn very well if datasets are in bulk.Therefore, the idea of the synthetic dataset is proposed and datasets are generated based upon the actual dataset.The accuracy of models is extremely high.
Figure 1 shows the myocardial infarction.An attack occurs when one among the heart's coronary arteries is blocked suddenly or has extremely slow blood drift.The foremost common MI is due to the bifurcation of the left arteria coronaria.The usual explanation for sudden blockage during an arteria coronaria is the formation of a thrombus.The grume typically forms inside an arteria coronaria that already has been narrowed by atherosclerosis, a condition during which fatty deposits (plaques) build up along the walls of blood vessels [1].Risk factors that can be controlled are high cholesterol, high bp, diabetes, weight, family history, smoking, unhealthy diet, lack of physical activities, and metabolic syndrome.
Risk factors that cannot control are the age of men greater than 45 and in women, it is considered greater than 55.If father or brother diagnosed attack before 55 years aged or mother or sister diagnosed before 65 years aged [2].This case history results in MI.Another factor is understood as Preeclampsia.This condition can develop during pregnancy.The 2 main signs of Preeclampsia are an increase in vital signs and excess protein within the urine [3].The main purpose of this research is to find MI in an early stage by using the above risk factors which will save the life of mankind.
Figure 2 shows the diagrammatic representations of the research idea.Diagnosis relies upon many various sorts of (accurate) data, from patient history to physical examination to lab data to past medical records and radiographic findings.Each patients' lifestyle, body system, and history are different.It is vital to notice that if the first prediction is feasible then the death rate with MI will certainly lessen and the lifetime of mankind will upgrade.Most vital thing is to think about those parameters of MI that are not included in early research but are most vulnerable for MI in today's life.
There is always a scope to exit from the prevailing approach and explore beyond the limit of other findings.Therefore, there's a requirement for designing a model which can predict MI early supported the parameters fed to the model.To reinforce the accuracy of the prognosis of MI for clinicians and clinical scientists, in our system, the input is gathered from many doctors personally and therefore the patient's data through proper channel with history of MI and this data set is given to the predictive model which then verifies and validates the proposed model.Early detection of MI will save the lifetime of mankind.This technique is going to be helpful to the doctor's assistant, nurses to require timely action if the doctor is not available within the hospital [4].

RESEARCH METHOD
Timely hospital reporting and diagnosis are critical within the myocardial infarct.The prehospital delay could even be a significant explanation for increased morbidity and mortality within the myocardial infarct.This study finds a scarcity of realization and poor transportation facilities due to the main contributors to the delay within the management of myocardial infarction.Misjudgment of symptoms and transport delays still contribute foremost to pre-hospital delays.Systems of ST-segment-elevation myocardial infarction (STEMI) care will be got to concentrate on these variables to make an enormous impact on patient outcomes in ST-elevation myocardial infarction [5].Atypical lipids, smoking, high blood pressure, diabetes, stomach obesity, psychosocial factors, eating fruits, vegetables, and alcohol, and regular physical activity account for several of the danger of myocardial infarct worldwide in both sexes and within the smallest amount ages altogether regions.This finding suggests that approaches to stop are often supported by similar principles worldwide and have the potential to prevent most premature cases of myocardial infarction [6].Cardiologists Dr. Ashar Khan (DM) and Dr. Tamim Fazil (Medicine) and other experts have given tons of input during this research.All aspects of MI were discussed with the expertise.Many inputs are provided by them.There's a variable parameter that is liable for shown within Table 1.Firstly, MI features are excerpted from a rigorous study of literature review.Supported the literature review a survey is conducted and 20 expertise opinions are taken.This survey revealed the foremost important factors that ought to be considered during the research like diabetics, history of patients, diet, and stress.Still smoking, eating habits, and stress are not ready to include during this as they're vital features.The rationale is the unavailability of the info at the time of admission of the patients.And missing values affect the performance of the model.And filling missing values with mean and median is not suggested by expertise.Because the wrong values can cause misclassification of the model.

Parameters excerpted from survey
Input features and their values are shown in Table 2 are extracted from the survey which is conducted during the research.

Statical analysis
This was an observational study conducted at two hospitals located in Nagpur (Kamptee).Data was collected prospectively of patients admitted within the hospital and treated for MI from March 2018 till Dec 2020.The information of patients is collected from the hospitals personally and analysis is completed.Employing a typical questionary, information was sought regarding the history of ischemic heart disease, coronary risk factors, time of onset of pain, pain type, patient's history, cholesterol, and blood pressure (BP).All parameters are considered and discussed the vulnerability of the parameters expertly and included during this research.As per the expertise, smoking and stress are the foremost important or responsible factors for MI.

23
Though they are not included within the research because the right information is not provided by the patients or not known by the relatives who are admitting the patients to the hospital.Data is gathered from the hospitals from the patients' reports.Patients are evaluated with age, sex, ECG changes, biomarkers (CK-MB, TROP-I), angiography (LAD, LCA, RCA) cholesterol, BP (systolic, diastolic), chest paint type (acute, chronic), diabetics, chronic kidney disease (CKD), autoimmune condition (AC), family history (FH), hormone replacement therapy (HRT), thyroid dysfunction (TD), acute kidney injury (AKI).The evaluation is administered with the assistance of experts.Statistical analysis is completed using google form and therefore the graph generated during the survey for extracting the MI parameters.Patients' data are collected and transformed into the specified format.In this proposal, experiences and knowledge of experience are used.Victimization of data to answer queries alongside the study of various algorithms like SVM, NB, DT, LR, KNN, Ensemble, and NN and expert opinion is taken into account.Various data pre-processing techniques like data cleaning and pruning also the normalization of knowledge are important steps to use before feeding input to the model.Various steps are involved as:

-Bucketization
It is used to make buckets for sub-features by disintegrating the main features into sub-features.

-Normalization
Data are normalized converted into numeric with the help of experts.

-Data cleaning and pruning
Data cleaning and pruning technique are performed on the chosen data in order that a correctly cleaned and pruned dataset provides far better precision than an unclean one with missing values.Data cleaning is the method of making data for the model by eradicating or altering data that is improper, imperfect, disparate, redundant, or inadequately formatted [18]- [20].

RESULTS AND DISCUSSION
In Figure 3 to Figure 21 graphs are created concerning each parameter vs the total number of patients count.A total of 565 patient data is collected from two hospitals.Of these, 65 patients' data have missing values.Therefore, it's not included in the research.Out of 500 data, there were 147 patients with angina, 150 were non-MI and 303 were of MI.To form data balanced each 150 approx.is taken into account for the research.Total 450 data is given to the model.Data analysis is carried out in Table 3.

Experimental result
The dataset of two hospitals situated in Nagpur (Kamptee) is employed to classify three sorts of MI, i.e.Early MI (angina), Non-MI, and MI.Various algorithms are applied to the present dataset which has 450 patients' information.It is observed that the best results were achieved using MLP (alpha=0.7).Other's algorithms also are giving better accuracy within the training and testing phase.The output of algorithms can be seen in Table 4. Though the result's appreciable, it is suggested further to add more patient details to see the accuracy of the model.Because the data is especially from one region.It is going to vary from region to region because the lifestyle, eating habits and stress levels change.Though these parameters are not included within the research due to the unavailability of the knowledge.But expertise already emphasized this feature.Therefore, it is suggested to consider more datasets on this to predict accurately.For this a novel idea is proposed i.e., to generate synthetic datasets.The following steps are applied for the creation of a synthetic dataset.

Function for generation of synthetic datasets
For a generation of synthetic datasets, firstly histogram of every feature is generated i.e., distribution of the information.Then normalized the histogram by scaling between zero and one.This distribution of data is then passed to the function that's used to prepare the synthetic datasets.here: l is lower limit of data u is the upper limit of data n is the number of samples to be generated d is the distribution based on actual dataset

Graph for synthetic dataset
The distribution of actual datasets is passed to the function to get synthetic datasets.And 45000 patient report is generated from 2149 actual data gathered from patients' reports.The value of n is increased from 1k to 15k.1k, 2k, 4k, 6k, 8k, 9k, 11k, 12k are giving NAN values.After 15k model accuracy is either constant or reducing.Therefore, the creation of synthetic data is stopped at 45000 samples.

The result on synthetic datasets
Table 5 listed the accuracy of the models for 15000 samples of synthetic datasets at the training and testing phase.In this KNN, RF is giving the highest accuracy.

CONCLUSION
This study has attempted to research the dataset about the input features and customary reasons for early MI in patients presenting to the hospital within the urban area of Nagpur (Kamptee).There are previous studies shown only about MI not included Early MI.There's lagging in data also that was not recent data.It's also noticed that the Indian data is not available.This research has been done from scratch.Dataset is collected from the two hospitals and expert assistance is taken to incorporate some important features for early MI.After the gathering of knowledge from hospitals, the info is analyzed and it's discovered that in 450 patients there's almost no change in AC, Hor_Repl, Thy_Dys, AKI parameters.It'd be this pathological test is not referred to during this area due to expensive or could be not responsible most for MI during this region.As per expertise opinion, these parameters can be eliminated.
Feature selection is performed on 450 patients' data.More data is collected for the creation of synthetic datasets.2149 patients' info is collected, Data cleaning and pruning technique is applied.A distribution graph is generated on this dataset and passed to the function to create synthetic datasets.This is done to create an authentic dataset.Expertise opinion is also taken on each step.Further work can be carried out by considering this opinion of experts.It is also suggested to collect more data from various regions of India to validate this work.

Figure 3 .
Figure 3. Graph between age vs total patient count

Table 3 .
Description of graph

Table 4 .
Output of algorithms

Table 5 .
Algorithm accuracy at 15000 samples