Page 68 - The-5th-MCAIT2021-eProceeding
P. 68
A Dictionary Based Approach for Malay Language Sentiment
Lexicon Generation
a*
b
c
Azilawati Rozaimee , Nazlia Omar , Sabrina Tiun and NurSharmini Alexander d
a FIK, UniSZA, 22200, KampusBesut, Terengganu, Malaysia
b,c CAIT, UKM 43600, Bangi, Selangor, Malaysia
d MAMPU, 63000 Cyberjaya, Malaysia
*Email: azila@unisza.edu.my
Abstract
The sentiment lexicon plays a vital role in ensuring a successful sentiment analysis task. The most common approach to
build one is via manual annotation. However, manual approach is labor-intensive, relatively slow, and requires much effort.
This paper aims to automatically generate a Malay Language sentiment lexicon to overcome the limitation of the human-
built sentiment lexicon. In this paper, we present the Malay Language Sentiment Lexicon generation algorithm based on a
dictionary-based approach. This algorithm will utilize WordNet 3.0 and WordNet Bahasa resources, which are then mapped
to build the new Malay Language Sentiment Lexicon. After the algorithm was run for five iterations from a pair of initial
words, the generated sentiment lexicon produced a total of 61605 words with 25541 positive words and 36064 negative
words. This shows that the proposed approach can generate a significant number of sentiment lexicons with reasonable
accuracy for formal terms by utilizing dictionaries like WordNet 3.0 and Wordnet Bahasa.
Keywords:sentiment lexicon; Malay Language; dictionary-based; WordNet 3.0; Wordnet Bahasa
1. Introduction
Over the last few decades, most research on sentiment analysis has been done in English and other widely
spoken languages such as Arabic and Chinese. English is a language that is available with many resources and
tools for natural language processing (Alsaffar & Omar, 2015). Consequently, the need for more studies on
sentiment analysis and construction of resources and tools for subjectivity and sentiment analysis in other low
resources languages, such as Malay, is growing due to the increasing number of reviews in that language (Mate,
2016).One of the major challenges in sentiment analysis is the lack of resources. The primary problem for the
development of sentiment analysis tools in Malay is almost none of the standard sentiment lexicon was
developed (Nasharuddin et al., 2017). This paper will discuss the proposed method for the automatic sentiment
lexicon generation for the Malay language. The remaining of this paper isas follows. Section II will discuss
some related works on the dictionary-based approach. Next, section III describes the methodology and dataset
used in this Malay sentiment lexicon generation development. Section IV will present the result and discussion
of this study. Finally, the limitation and future works will be discussed in the conclusion section.
2. Related Works
Sentiment analysis is a highly active area of research that involves the computational study of opinions,
evaluations, and reviews about products, services, and policies that are expressed in the written language, as
well as the construction of sentiment corpora and dictionaries.There are two main approaches to perform
sentiment analysis which are (i) machine learning approach; and (ii) lexicon-based approach. Most of the
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [55]
Artificial Intelligence in the 4th Industrial Revolution