Page 137 - The-5th-MCAIT2021-eProceeding
P. 137

House Price Prediction in Selangor Using Machine Learning
                                             Algorithms


                                                                          b
                                                     a*
                              Azwanis Abdosamad , Nor Samsiah Sani

         a,b
           Faculty of Information Science and Technology,Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor
                                         *Email: P95394@siswa.ukm.edu.my

        Abstract

        The increase of housing price every year is very worrying, especially for buyers who are in urban areas. Selangor
        has an area with a high population density and high house prices prompted for this study to be conducted.
        Particularly, it is very helpful for house buyer/seller or a real estate agent broker get insight in making wise
        decision considering the housing price prediction. The purpose of this study is to find out and identify the
        important features that influence housing price in Selangor. The housing dataset is obtained from National
        Property Information Centre (NAPIC) which has a total of 64982 data and 23 attributes. The dataset contains
        data of residency sector in Selangor from 2015 through 2020. The house price attribute was selected as the
        dependent variable which is the target value in this study. After feature selection is made, several parameters
        were optimized to model the house price prediction. Three (3) algorithms are Random Forest (RF), Gradient
        Boost  Decision  Tree  (GBDT)  and  k-Nearest  Neighbors  (k-NN)  are  developed  by  using  machine  learning
        techniques. Mean Squared Error (MSE) values of each algorithm is determined and compared to find the best
        algorithm in term of accuracy. From the findings, it is found that RF algorithm achieves the best performance
        model with MSE 0.00017549 value

        Keywords: Machine Learning; Random Forest; Gradient Boost Decision Tree; k-Nearest Neighbors; Mean Square Error


        1.  Introduction

           A  house  is  defined  as  a  home  that  meets  other  basic  needs  (UN-Habitat,  2011).  In  an  era of  advanced
        technology, homes are not only the shelter for people, but also a long-term asset and investment. However, the
        increase in house prices which is increasingly worrying in Malaysia which causes the people in this country not
        afford to own their own house (Azima Abdul Manaf, 2019). There are many factors that causing a serious
        increase in house prices, among them are the demand, supply of house prices and pricing by developers. A good
        forecast model is needed to predict house prices. Thus, house price prediction models using different machine
        learning algorithms to produce high-accuracy forecast models.
           Main objectives of this paper are to identify the important features that influence the price of a house in
        Selangor, to develop and make a comparison of three (3) models by using machine learning techniques and
        identifying the best house price model among of the three models developed. In this study, data set is obtained
        from National Property Information Centre (NAPIC). The data is a type of residential category which includes
        terraced and multi-floor houses that not exceed three floors. This data set were collected in Selangor from 2015
        until mid-2020 has a total of 64982 data with 23 attributes.

        2.  Related Work

           Winky K.O et al. (2020) has used RF, GBDT, SVM to determine the house price prediction in Hong Kong
        by using those algorithms. He concludes that RF and GBDT produce more accurate price estimates than SVM.







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [124]
        Artificial Intelligence in the 4th Industrial Revolution
   132   133   134   135   136   137   138   139   140   141   142