Page 138 - The-5th-MCAIT2021-eProceeding
P. 138

Sifei Lu et al. (2017) attempt to estimate house price in Ames by using Lasso and Gradient Boosting Decision
        Tree. House price forecast based on house characteristics and location which is find out that GBDT produces

        better model capabilities than Lasso. Study conducted by Nissan, Pow, Emil Janulewics & Liu (2015)  prove
        that k-NN and Random Forest show excellent performance compared to linear regression. k-NN is the best
        model with the lowest square error rate. Their study is using the data set which is extracted from the centris.ca
        website.

        3.  Material and Method

           Random  Forest (RF), Gradient Boosting Decision Tree (GBDT) and k-Nearest Neighbors (k-NN)  were
        applied  for  regression  and  comparison  between  them  for  the  most  accuracy  model.  After  the  analysis,
        conclusion and the recommendations have been write up to provide the output

        3.1  Data Pre-Processing

           In this study, Scikit-Learn and Microsoft Excel are being used as a machine learning tools to perform pre-
        processing tasks. In a data cleaning process, there is data missing and not filled in certain attribute lines. The
        solution is taken to manually fill in Excel. Missing data is fill in with the median value. In this study, the missing
        data on attributes that require filling in the median values are atribut luas_lot, luas_lot bangunan and b_tingkat.
        There are also attributes that have noise data where the data is filled in with incorrect or unreasonable values.
        The attribute is b_tingkat. Data reduction is made so that the remaining values are correct. There are also
        attributes that have no value in some lines and have no relation to other attributes such as the keadaan_bgn
        attribute. The solution is also to delete the data from the data record. There are 44 records of noise and irrelevant
        data where these records are being deleted. The data in the data set also goes through the process of data
        transformation where converting category data to numeric. After going through the pre-process, only 9 attributes
        are left that are called as essential attributes. The attributes are as in Table 1 below.

        Table 1    Essential Attributes with Data Type

                                Essential Attribute                  Data Type
                      daerah1                                         nominal
                      jenis_pegangan                                  nominal
                      pro_type                                        nominal
                      b_tingkat                                       numeric
                      luas_lot_bgn                                    numeric
                      luas_lot                                        numeric
                      harga_b                                         numeric
                      keadaan_bgn                                     nominal
                      thn_perjanjian                                  numeric

        4.  Results and Discussion

           Nine important attributes that have been selected are tested using the feature selection method and then
        through the correlation matrix method to determine the strength of the relationship between those attributes and
        the house price attribute. Figure 4 shows the heat map used in the correlation matrix experiment using numerical
        attributes. The heat map shows the correlation relationship between the attributes and the house price where the







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [125]
        Artificial Intelligence in the 4th Industrial Revolution
   133   134   135   136   137   138   139   140   141   142   143