Page 137 - The-5th-MCAIT2021-eProceeding
P. 137
House Price Prediction in Selangor Using Machine Learning
Algorithms
b
a*
Azwanis Abdosamad , Nor Samsiah Sani
a,b
Faculty of Information Science and Technology,Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor
*Email: P95394@siswa.ukm.edu.my
Abstract
The increase of housing price every year is very worrying, especially for buyers who are in urban areas. Selangor
has an area with a high population density and high house prices prompted for this study to be conducted.
Particularly, it is very helpful for house buyer/seller or a real estate agent broker get insight in making wise
decision considering the housing price prediction. The purpose of this study is to find out and identify the
important features that influence housing price in Selangor. The housing dataset is obtained from National
Property Information Centre (NAPIC) which has a total of 64982 data and 23 attributes. The dataset contains
data of residency sector in Selangor from 2015 through 2020. The house price attribute was selected as the
dependent variable which is the target value in this study. After feature selection is made, several parameters
were optimized to model the house price prediction. Three (3) algorithms are Random Forest (RF), Gradient
Boost Decision Tree (GBDT) and k-Nearest Neighbors (k-NN) are developed by using machine learning
techniques. Mean Squared Error (MSE) values of each algorithm is determined and compared to find the best
algorithm in term of accuracy. From the findings, it is found that RF algorithm achieves the best performance
model with MSE 0.00017549 value
Keywords: Machine Learning; Random Forest; Gradient Boost Decision Tree; k-Nearest Neighbors; Mean Square Error
1. Introduction
A house is defined as a home that meets other basic needs (UN-Habitat, 2011). In an era of advanced
technology, homes are not only the shelter for people, but also a long-term asset and investment. However, the
increase in house prices which is increasingly worrying in Malaysia which causes the people in this country not
afford to own their own house (Azima Abdul Manaf, 2019). There are many factors that causing a serious
increase in house prices, among them are the demand, supply of house prices and pricing by developers. A good
forecast model is needed to predict house prices. Thus, house price prediction models using different machine
learning algorithms to produce high-accuracy forecast models.
Main objectives of this paper are to identify the important features that influence the price of a house in
Selangor, to develop and make a comparison of three (3) models by using machine learning techniques and
identifying the best house price model among of the three models developed. In this study, data set is obtained
from National Property Information Centre (NAPIC). The data is a type of residential category which includes
terraced and multi-floor houses that not exceed three floors. This data set were collected in Selangor from 2015
until mid-2020 has a total of 64982 data with 23 attributes.
2. Related Work
Winky K.O et al. (2020) has used RF, GBDT, SVM to determine the house price prediction in Hong Kong
by using those algorithms. He concludes that RF and GBDT produce more accurate price estimates than SVM.
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [124]
Artificial Intelligence in the 4th Industrial Revolution