Page 69 - The-5th-MCAIT2021-eProceeding
P. 69

sentiment analysis tools are built upon machine learning approach, which is supervised learning.However, this
        approach requires extensive training data to make sentiment analysis successful.
            The alternative solution towards sentiment analysis with little or no training data is by having the lexicon-
        based approach. The lexicon-based approach applies classification weights associated with them, and the weight
        can  be  in  binary  polarities  (positive/negative)  or  a  numerical  polarities  format.  Most  of  the  research  in
        unsupervised sentiment classification makes use of available lexical resources.One of the advantages of using
        a lexicon approach is that the lexicon can be built from a large corpus and then used in other applications where
        there may not be enough information to perform corpus-based approaches (Labille et al., 2017). This is in line
        with Nasharuddin et al., (2017) where lexicon-based approach does not require storing a large data corpus and
        training, so the whole process is much faster. The dictionary-based approach is one of the lexicon approaches
        which is a simple technique whereby it uses a few seed sentiment words to bootstrap based on the synonym and
        antonym structure of a dictionary (e.g., WordNet, Wordnet Bahasa). This approach starts  with a  manually
        collected seed set of positive and negative sentiment words, and then the expansion algorithm is iteratively
        executed to expand this set by searching in the dictionary for their synonyms and antonym and added to the
        seed list. After the expansion algorithm were run for a number of iterations, final list of latest seed set will be
        known as the sentiment lexicon. Other alternative methodsare hybrid and corpus-based approaches.
            Several algorithms have been proposed to automatically generate sentiment lexicons using the dictionary-
        based approach for different languages across the world.  Much work has been carried out for major languages
        such  as  English,  Chinese,  Spanish,  and  some  other  low  resources  languages  such  as  the  Amharic  and
        Vietnamese language (Alemneh et al., 2020; Le et al., 2019).  There is also prior work on sentiment lexicon that
        has been successfully developed for 136 major languages (Chen & Skiena, 2014). The constructed sentiment
        lexicon was done by appropriately propagating from seed words and resulting in high polarity agreement with
        published  lexicons  while  achieving  an  acceptable  lexical  coverage.Another  previous  work  also  applies  a
        dictionary-based algorithm to generate an Arabic sentiment lexicon that assigns sentiment scores to the words
        found in the Arabic WordNet (Mahyoub et al., 2014). This study works by linking the lexicon of AraMorph
        with SentiWordNetand shows that it can outperform state-of-the-art lexicon in terms of accuracy and F1-score.
        Other than that, Park & Kim (2016) propose a method to build a thesaurus lexicon for the Korean language.
        The dictionary-based approach uses three online dictionaries to collect thesauruses based on the seed words,
        and  stores  only  co-occurrence  words  into  the  thesaurus  lexicon  to  improve  the  reliability  of  the  thesaurus
        lexicon.  As  for  the  Malay  language,  the  algorithm  for  the  dictionary-based  approach  was  developed  by
        Alexander & Omar (2017) and   Darwich et al.(2016). The shortage of lexical resources that can assist in
        sentiment  analysis  task  in  the  Malay  language  motivates  the  author  to  develop  an  algorithm  that  can
        automatically  generate  a  standard  Malay  sentiment  lexicon  from  the  available  Wordnet  and  Wordnet
        Bahasadictionary with lesser human intervention.

        3. Methodology

            The  methodology  used  in  this  study  was  based  on  the  dictionary-based  sentiment  lexicon  generation
        approach.  Figure  1  shows  the  model  of  Malay  language  sentiment  lexicon  generation.  Sentiment  lexicon
        generation phases start with a manually selected seed set. The seed set contains the list of the most important
        positive and negative words.The seed set will be expanded through the process of mapping words that exist in
        the seed set and match them withsynonyms and antonymsfound in the Wordnet 3.0 and Wordnet Bahasa using
        bootstrapping technique. This algorithmbeginsby matching the seed set with synsetid in Wordnet Bahasa to get
        the  offset  value.  The  expansion  of  the  seed  set  works  by  matchingthe  obtained  offset  valuetothe
        correspondingwordidto find the antonym and synonym of the positive and negative adjective which later will
        be added as the expanded lexicon. This expanded lexicon will become the seed set list for the next iteration.This
        process is iteratively done until no new words were found. In this work, this algorithm is repeated five times.
        The combination of the initial seed set, and expanded seed set generatedis finally called the sentiment lexicon,







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [56]
        Artificial Intelligence in the 4th Industrial Revolution
   64   65   66   67   68   69   70   71   72   73   74