Page 69 - The-5th-MCAIT2021-eProceeding
P. 69
sentiment analysis tools are built upon machine learning approach, which is supervised learning.However, this
approach requires extensive training data to make sentiment analysis successful.
The alternative solution towards sentiment analysis with little or no training data is by having the lexicon-
based approach. The lexicon-based approach applies classification weights associated with them, and the weight
can be in binary polarities (positive/negative) or a numerical polarities format. Most of the research in
unsupervised sentiment classification makes use of available lexical resources.One of the advantages of using
a lexicon approach is that the lexicon can be built from a large corpus and then used in other applications where
there may not be enough information to perform corpus-based approaches (Labille et al., 2017). This is in line
with Nasharuddin et al., (2017) where lexicon-based approach does not require storing a large data corpus and
training, so the whole process is much faster. The dictionary-based approach is one of the lexicon approaches
which is a simple technique whereby it uses a few seed sentiment words to bootstrap based on the synonym and
antonym structure of a dictionary (e.g., WordNet, Wordnet Bahasa). This approach starts with a manually
collected seed set of positive and negative sentiment words, and then the expansion algorithm is iteratively
executed to expand this set by searching in the dictionary for their synonyms and antonym and added to the
seed list. After the expansion algorithm were run for a number of iterations, final list of latest seed set will be
known as the sentiment lexicon. Other alternative methodsare hybrid and corpus-based approaches.
Several algorithms have been proposed to automatically generate sentiment lexicons using the dictionary-
based approach for different languages across the world. Much work has been carried out for major languages
such as English, Chinese, Spanish, and some other low resources languages such as the Amharic and
Vietnamese language (Alemneh et al., 2020; Le et al., 2019). There is also prior work on sentiment lexicon that
has been successfully developed for 136 major languages (Chen & Skiena, 2014). The constructed sentiment
lexicon was done by appropriately propagating from seed words and resulting in high polarity agreement with
published lexicons while achieving an acceptable lexical coverage.Another previous work also applies a
dictionary-based algorithm to generate an Arabic sentiment lexicon that assigns sentiment scores to the words
found in the Arabic WordNet (Mahyoub et al., 2014). This study works by linking the lexicon of AraMorph
with SentiWordNetand shows that it can outperform state-of-the-art lexicon in terms of accuracy and F1-score.
Other than that, Park & Kim (2016) propose a method to build a thesaurus lexicon for the Korean language.
The dictionary-based approach uses three online dictionaries to collect thesauruses based on the seed words,
and stores only co-occurrence words into the thesaurus lexicon to improve the reliability of the thesaurus
lexicon. As for the Malay language, the algorithm for the dictionary-based approach was developed by
Alexander & Omar (2017) and Darwich et al.(2016). The shortage of lexical resources that can assist in
sentiment analysis task in the Malay language motivates the author to develop an algorithm that can
automatically generate a standard Malay sentiment lexicon from the available Wordnet and Wordnet
Bahasadictionary with lesser human intervention.
3. Methodology
The methodology used in this study was based on the dictionary-based sentiment lexicon generation
approach. Figure 1 shows the model of Malay language sentiment lexicon generation. Sentiment lexicon
generation phases start with a manually selected seed set. The seed set contains the list of the most important
positive and negative words.The seed set will be expanded through the process of mapping words that exist in
the seed set and match them withsynonyms and antonymsfound in the Wordnet 3.0 and Wordnet Bahasa using
bootstrapping technique. This algorithmbeginsby matching the seed set with synsetid in Wordnet Bahasa to get
the offset value. The expansion of the seed set works by matchingthe obtained offset valuetothe
correspondingwordidto find the antonym and synonym of the positive and negative adjective which later will
be added as the expanded lexicon. This expanded lexicon will become the seed set list for the next iteration.This
process is iteratively done until no new words were found. In this work, this algorithm is repeated five times.
The combination of the initial seed set, and expanded seed set generatedis finally called the sentiment lexicon,
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [56]
Artificial Intelligence in the 4th Industrial Revolution