Page 68 - The-5th-MCAIT2021-eProceeding
P. 68

A Dictionary Based Approach for Malay Language Sentiment
                                       Lexicon Generation


                             a*
                                             b
                                                             c
        Azilawati Rozaimee , Nazlia Omar , Sabrina Tiun  and NurSharmini Alexander          d
                                 a FIK, UniSZA, 22200, KampusBesut, Terengganu, Malaysia
                                     b,c CAIT, UKM 43600, Bangi, Selangor, Malaysia
                                        d MAMPU, 63000 Cyberjaya, Malaysia
                                           *Email: azila@unisza.edu.my

        Abstract

        The sentiment lexicon plays a vital role in ensuring a successful sentiment analysis task. The most common approach to
        build one is via manual annotation. However, manual approach is labor-intensive, relatively slow, and requires much effort.
        This paper aims to automatically generate a Malay Language sentiment lexicon to overcome the limitation of the human-
        built sentiment lexicon. In this paper, we present the Malay Language Sentiment Lexicon generation algorithm based on a
        dictionary-based approach. This algorithm will utilize WordNet 3.0 and WordNet Bahasa resources, which are then mapped
        to build the new Malay Language Sentiment Lexicon. After the algorithm was run for five iterations from a pair of initial
        words, the generated sentiment lexicon produced a total of 61605 words with 25541 positive words and 36064 negative
        words. This shows that the proposed approach can generate a significant number of sentiment lexicons with reasonable
        accuracy for formal terms by utilizing dictionaries like WordNet 3.0 and Wordnet Bahasa.

        Keywords:sentiment lexicon; Malay Language; dictionary-based; WordNet 3.0; Wordnet Bahasa



        1. Introduction

            Over the last few decades, most research on sentiment analysis has been done in English and other widely
        spoken languages such as Arabic and Chinese. English is a language that is available with many resources and
        tools for natural language processing (Alsaffar & Omar, 2015). Consequently, the need for more studies on
        sentiment analysis and construction of resources and tools for subjectivity and sentiment analysis in other low
        resources languages, such as Malay, is growing due to the increasing number of reviews in that language (Mate,
        2016).One of the major challenges in sentiment analysis is the lack of resources. The primary problem for the
        development  of  sentiment  analysis  tools  in  Malay  is  almost  none  of  the  standard  sentiment  lexicon  was
        developed (Nasharuddin et al., 2017). This paper will discuss the proposed method for the automatic sentiment
        lexicon generation for the Malay language. The remaining of this paper isas follows. Section II will discuss
        some related works on the dictionary-based approach. Next, section III describes the methodology and dataset
        used in this Malay sentiment lexicon generation development. Section IV will present the result and discussion
        of this study. Finally, the limitation and future works will be discussed in the conclusion section.

        2. Related Works

            Sentiment analysis is a highly active area of research that involves the computational study of opinions,
        evaluations, and reviews about products, services, and policies that are expressed in the written language, as
        well  as  the  construction  of  sentiment  corpora  and  dictionaries.There  are  two  main  approaches  to  perform
        sentiment  analysis  which  are  (i)  machine  learning  approach;  and  (ii)  lexicon-based  approach.  Most  of  the







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [55]
        Artificial Intelligence in the 4th Industrial Revolution
   63   64   65   66   67   68   69   70   71   72   73