Page 202 - The-5th-MCAIT2021-eProceeding
P. 202

The following summarizes previous literature's limitations and provides several exciting research directions.
        There is an overwhelming amount of reviews; most of these reviews are not relevant or non-informative to the
        evaluating models. For example, SMA can have hundreds of thousands or even millions of reviews. Facebook
        SMA gets more than 10000 reviews every three days (Xiao et al. 2020). Challenges stem from the data's scale,
        unique format, diverse nature, and a high percentage of irrelevant information and spam (Häring, Stanik, and
        Maalej 2021).
           In previous works, information extraction, keyword extraction, feature extraction, etc. method employed a
        vast, well-organized public lexicon known as WordNet to avoid a  vast annotated corpus (Orkphol & Yang
        2019). Recently, WordNet has been eclipsed by the success of the new lexical similarity benchmarks with the
        achievement of  word embedding  (Jimenez et al. 2019). WordNet's improvement by combining other  word
        embedding Word2vec and Word2set has achieved better results than the classical WordNet-based approaches
        and competitive with those neural embeddings. The word relatedness affected by that combination makes the
        efficiency for development (Lee et al. 2019). Furthermore, extracting related reviews to social requirement terms
        using classical WordNet-based will not produce better results due to the weakness of word relatedness as the
        direct semantic relations that assuming the links between concepts represent distances. In addition, such links
        do not cover all possible relations between synsets. In this study, we present how to tackle the weakness of
        Wordnet in representing review by combining word embedding and Wordnet lexical that later can be used to
        extract SMA reviews that related to the only social requirement.

        3. Bag-of-requirement (BOR) Representation

           To handle the weakness of classic WordNet representation, this work proposes a word representation based
        on  extended  WordNet  and  word  embedding  named  bag  of  requirement  (BOR)  representation.  The  BOR
        addresses the data sparseness and cuts off the threshold of classic WordNet representation. The method to build
        BOR of SMA reviews consists of three main steps, given the user reviews and social requirement term (SRT)
        as the input as shown in Figure 1.
          Step 1: Combine word embedding and Wordnet to expand WordNet.
          Step 2  Build SRTV by enriching SRT.
          Step 3: Build BOR using vectorized reviews (bag-of-word, unigram) and SRTV.


                           Word embedding                              Reviews
                                                   SRT
                                                                     Bag-of-word
                           Synset WordNet          SRTV
                                                                      uni-gram

                                  Bag-Of-Requirements                   KNN


        Fig. 1. Building BOR for SMA reviews

        3.1. Step 1: Combining word embedding and wordnet

           This step combines word embeddings and WordNet lexical database. Let Si, m be the m-the sense associated
        with the word wi. Then the path distance between the senses of all the noun pairs and some verb pairs can be








        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [189]
        Artificial Intelligence in the 4th Industrial Revolution
   197   198   199   200   201   202   203   204   205   206   207