Page 86 - The-5th-MCAIT2021-eProceeding
P. 86

patient’s quality of life. However,  PCI still has the risk of complications like others surgical procedures such
        as blood clotting, heart attack, bleeding and death after procedure or within 30 days of post procedure.
           Therefore, machine learning will be implemented to improve predictive effectiveness, prognosis and improve
        patient care, as suggested by Gui & Chan (2017). Thus, the main purpose of this study was to develop a PCI
        mortality  prediction  model  and  determine  the  significant  factors  of  contributing  to  it.  The  main  challenge
        however, is the naturally imbalanced data as only 1% death from 2007 to 2016 patients.
           Previously, Mohamad & Bee Wah (2019) have developed a post PCI survival model using IJN datasets with
        the best performance of Naives Bayes algorithm with accuracy was 79.13%, sensitivity  was 75.73%, specificity
        was 82.52%, precision was 81.25% and error rate was 20.87%. Random Under-sampling (RUS) was used as
        sampling method for imbalance class problem with 300 (1.06%) data used from total 28407 row provided. There
        are 12 attributes selected based on previous literature review selection and consisted of demography, life style,
        lipid profile, comorbidities and physical measurements. Therefore, this study aims to make an improvements
        and use the best approach to develop the best model performance.


        1.1. Dataset
        Data were taken from IJN with the ethical permission (IJNREC/457/2020), Institutional Review Board and
        fulfilled the Helsinki Declaration. It consists of 23638 patients with total of 28407 PCI procedures that involving
        40244 lesion records since the year 2007 until 2016. Number of attributes were 466 with 44 are demographics
        data (Dataset A), 126 attributes were from the intra-procedure (Dataset B) and the rest are from post-procedure
        (Dataset C).  Dataset are severely imbalanced as only 1% of death were recorded.


        2. Methodology
        2.1. Pre-process


           Preprocessing  phase  involving  data  cleaning  that  encompasses  of  elimination  of  meaningless  feature,
        elimination of feature with more 50% missing value, missing value identification, extraction of useful mining
        lesion data, data consolidation for table lesion with other table  and merged with unique ID, outliers value
        elimination  and  missing  value  and  elimination  of  features  with  the  same  meaning.  Transformation  phase
        involves the generation of new features, replacing missing data with average values and One-Hot Encoding
        which gives value of 0 and 1. This phase requires approximately 80% of time and effort.

        Table 1   Summary Result for Features Selection for Dataset A, B and C. There are Two Type of Features; Main and One Hot Encoding
        (OHC) with Total of Feature Selection IG and Matrix Correlation.

            Feature Selection   Dataset  A          Dataset  B           Dataset C
            Type of Feature   Main    OHC           Main       OHC       Main      OHC
            Original Number   44     69             126        239       157       286
            IG Feature Selection   15   20          53         75        69        96
            Main IG Feature   Systolic Group        IABP                 Complication
            Selection      Heart Rate Group         PCI Status           Cardiogenic Shock
                           MDRD Group               STEMI                IABP
            Total Feature   12       13             45         58        60        75
            Matrix Correlation
            Feature Reduced   72.23%                64.26%               61.78%








        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [73]
        Artificial Intelligence in the 4th Industrial Revolution
   81   82   83   84   85   86   87   88   89   90   91