Page 139 - The-5th-MCAIT2021-eProceeding
P. 139

relationship strength is shown in the form of numerical values.   The relationship between attributes and house
        price attributes is shown in Table 2 below. The correlation value of this coefficient is between -0 and 1.

        Table 2    Level of Attributes Relationship with Home Price Attribute

                  Attribute               Correlation Matrix         Level of Relationship
                  b_tingkat                     0.34                     Quite strong
                   luas_lot                     0.16                      Moderate
                 luas_lot_bgn                   0.16                      Moderate
                thn_perjanjian                   -0                     No correlation

           In this study after the experiments were conducted, the Random Forest (RF) algorithm gave the lowest MSE
        value  of  0.00017549,  followed  by  the  Gradient  Boosting  Decision  Tree  (GBDT)  algorithm  and  k-Nearest
        Neighbors (k-NN) with a value of 0.00020321 and 0.00022385 respectively. The result of each algorithm is
        shown in Table 3 below.

        Table 3    Level of Attributes Relationship with Home Price Attribute

                        Prediction Model                                 MSE
                              RF                                      0.00017549
                            GBDT                                      0.00020321
                              k-NN                                    0.00022385

        5.  Conclusion

           This study employs machine learning techniques to develop a price prediction model for house in Selangor.
        It uses a dataset of residential housing for a 5-year period from year 2015 until 2020. The regression model
        performances of the models are compared with one another and the accuracy of the prediction in this study
        assessed by checking the mean squared error score of the training model. The pre-processing method has been
        made before the test and the data is divided into two parts which is the training set and the test set. Random
        Forest (RF), Gradient Boosting Decision Tree (GBDT) and k-Nearest Neighbors (k-NN) model which is a
        machine  learning  regression  algorithm  are  used  in  this  study.  The  correlation  relationship  between  the
        dependent and independent variables was determined through the Pearson Coefficient. In this relationship is
        explained through a correlation matrix where the relationship is expressed in coefficient values. From the results
        of the experiments conducted, the correlation relationship for the numerical type variable is moderate with the
        house price dependent variable. Based on the experimental results, the Random Forest (RF) model is the best
        predictive model and has high accuracy compared to the Gradient Boosting Decision Tree (GBDT)  and k-
        Nearest Neighbors (k-NN) models.


        Acknowledgements

           I would like to acknowledge with gratitude to my supervisor Dr Nor Samsiah for her guidance in the process
        of this work. Special thanks to National Property Information Center (NAPIC) for the data set used in the
        experiment.









        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [126]
        Artificial Intelligence in the 4th Industrial Revolution
   134   135   136   137   138   139   140   141   142   143   144