Page 86 - The-5th-MCAIT2021-eProceeding

P. 86

patient’s quality of life. However, PCI still has the risk of complications like others surgical procedures such
as blood clotting, heart attack, bleeding and death after procedure or within 30 days of post procedure.
Therefore, machine learning will be implemented to improve predictive effectiveness, prognosis and improve
patient care, as suggested by Gui & Chan (2017). Thus, the main purpose of this study was to develop a PCI
mortality prediction model and determine the significant factors of contributing to it. The main challenge
however, is the naturally imbalanced data as only 1% death from 2007 to 2016 patients.
Previously, Mohamad & Bee Wah (2019) have developed a post PCI survival model using IJN datasets with
the best performance of Naives Bayes algorithm with accuracy was 79.13%, sensitivity was 75.73%, specificity
was 82.52%, precision was 81.25% and error rate was 20.87%. Random Under-sampling (RUS) was used as
sampling method for imbalance class problem with 300 (1.06%) data used from total 28407 row provided. There
are 12 attributes selected based on previous literature review selection and consisted of demography, life style,
lipid profile, comorbidities and physical measurements. Therefore, this study aims to make an improvements
and use the best approach to develop the best model performance.

1.1. Dataset
Data were taken from IJN with the ethical permission (IJNREC/457/2020), Institutional Review Board and
fulfilled the Helsinki Declaration. It consists of 23638 patients with total of 28407 PCI procedures that involving
40244 lesion records since the year 2007 until 2016. Number of attributes were 466 with 44 are demographics
data (Dataset A), 126 attributes were from the intra-procedure (Dataset B) and the rest are from post-procedure
(Dataset C). Dataset are severely imbalanced as only 1% of death were recorded.

2. Methodology
2.1. Pre-process

Preprocessing phase involving data cleaning that encompasses of elimination of meaningless feature,
elimination of feature with more 50% missing value, missing value identification, extraction of useful mining
lesion data, data consolidation for table lesion with other table and merged with unique ID, outliers value
elimination and missing value and elimination of features with the same meaning. Transformation phase
involves the generation of new features, replacing missing data with average values and One-Hot Encoding
which gives value of 0 and 1. This phase requires approximately 80% of time and effort.

Table 1 Summary Result for Features Selection for Dataset A, B and C. There are Two Type of Features; Main and One Hot Encoding
(OHC) with Total of Feature Selection IG and Matrix Correlation.

Feature Selection Dataset A Dataset B Dataset C
Type of Feature Main OHC Main OHC Main OHC
Original Number 44 69 126 239 157 286
IG Feature Selection 15 20 53 75 69 96
Main IG Feature Systolic Group IABP Complication
Selection Heart Rate Group PCI Status Cardiogenic Shock
MDRD Group STEMI IABP
Total Feature 12 13 45 58 60 75
Matrix Correlation
Feature Reduced 72.23% 64.26% 61.78%

E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [73]
Artificial Intelligence in the 4th Industrial Revolution

81 82 83 84 85 86 87 88 89 90 91