Page 202 - The-5th-MCAIT2021-eProceeding
P. 202
The following summarizes previous literature's limitations and provides several exciting research directions.
There is an overwhelming amount of reviews; most of these reviews are not relevant or non-informative to the
evaluating models. For example, SMA can have hundreds of thousands or even millions of reviews. Facebook
SMA gets more than 10000 reviews every three days (Xiao et al. 2020). Challenges stem from the data's scale,
unique format, diverse nature, and a high percentage of irrelevant information and spam (Häring, Stanik, and
Maalej 2021).
In previous works, information extraction, keyword extraction, feature extraction, etc. method employed a
vast, well-organized public lexicon known as WordNet to avoid a vast annotated corpus (Orkphol & Yang
2019). Recently, WordNet has been eclipsed by the success of the new lexical similarity benchmarks with the
achievement of word embedding (Jimenez et al. 2019). WordNet's improvement by combining other word
embedding Word2vec and Word2set has achieved better results than the classical WordNet-based approaches
and competitive with those neural embeddings. The word relatedness affected by that combination makes the
efficiency for development (Lee et al. 2019). Furthermore, extracting related reviews to social requirement terms
using classical WordNet-based will not produce better results due to the weakness of word relatedness as the
direct semantic relations that assuming the links between concepts represent distances. In addition, such links
do not cover all possible relations between synsets. In this study, we present how to tackle the weakness of
Wordnet in representing review by combining word embedding and Wordnet lexical that later can be used to
extract SMA reviews that related to the only social requirement.
3. Bag-of-requirement (BOR) Representation
To handle the weakness of classic WordNet representation, this work proposes a word representation based
on extended WordNet and word embedding named bag of requirement (BOR) representation. The BOR
addresses the data sparseness and cuts off the threshold of classic WordNet representation. The method to build
BOR of SMA reviews consists of three main steps, given the user reviews and social requirement term (SRT)
as the input as shown in Figure 1.
Step 1: Combine word embedding and Wordnet to expand WordNet.
Step 2 Build SRTV by enriching SRT.
Step 3: Build BOR using vectorized reviews (bag-of-word, unigram) and SRTV.
Word embedding Reviews
SRT
Bag-of-word
Synset WordNet SRTV
uni-gram
Bag-Of-Requirements KNN
Fig. 1. Building BOR for SMA reviews
3.1. Step 1: Combining word embedding and wordnet
This step combines word embeddings and WordNet lexical database. Let Si, m be the m-the sense associated
with the word wi. Then the path distance between the senses of all the noun pairs and some verb pairs can be
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [189]
Artificial Intelligence in the 4th Industrial Revolution