Page 207 - The-5th-MCAIT2021-eProceeding
P. 207

et al.,2011; Mikolov et al.,2013). One drawback of the previous methods is that they operate at word-level, so
        that morphological rich words or vocabulary words can be modelled more closely. WE has been recently used
        to address the vocabulary mismatch problem (Roy et al.,2016; El Mahdaouy et al.,2018; Fernández-Reyes et
        al.,2018).  WE  are  distributed  representations  techniques  of  the  words  commonly  extracted  from  a  neural
        network that models the joint distribution of the corpus vocabulary. The embedding models are usually trained
        in a broad corpus based on term proximity (Diaz et al.,2016).
           Recently, most research in AQE relies on WE as a semantic modelling technique (ALMasri et al.,2016; Roy
        et al.,2016). To leverage WE to improve AQE effectiveness, Roy et al. (2016) proposed three AQE methods
        based on the WE technique. The AQE technique is devised using semantic relationships in a distribution of the
        terms, where the candidate related terms have been obtained using the K-Nearest Neighbor (K-NN) approach.
        Several studies were found to use the WE for AQE (Zamani & Croft,2016; Zamani & Croft,2017; El Mahdaouy
        et al.,2018).
           El Mahdaouy et al. (2019) proposed incorporating WE similarity into PRF models for AIR. The principal
        idea is to select expansion terms in the PRF documents with their distribution and their similarity to the original
        query terms. The study hypothesizes that WE can be used for AIR in the PRF framework, as similar words to
        be grouped together to one side are close to each other in the vector space. The main aim is to increase the
        weight and the original query terms in semantically related terms. Three neural WE models, including a Skip-
        gram, Words CBOW Continuous Bag and Glove, are investigated in this work. Evaluations  are carried out
        using three neural WE models on the standard TREC 2001/2002 Arabic test collection. The aim of the study is
        to  understand  how  WE  can  be  used  in  PRF  techniques  for  AIR.  Results  showed  that  the  PRF  extensions
        proposed exceed the PRF baseline models significantly. In addition, they increased by 22% the basic IR model
        for  MAP  and  68%  the  robustness  index.  In  addition,  there  was  not  statistically  significant  performance
        difference between the three model WE models (Glove, CBOW and Skip-gram).
           Besides, Maryamah et al. (2019) proposed an AQE method based on  BabelNet using the WE technique
        Word2Vec.  BabelNet  is  a  semantic  search  dictionary  that  combines  knowledge  of  Wikipedia  articles  and
        lexicographic from Wordnet (Navigli & Ponzetto,2012). WordNet is used to acquire the synsets based on lexical
        or semantic relationships between terms, whereas uses the relationship between entities on the Wikipedia page.
        The candidate expansion term is also obtained from WordNet and synonyms. Based on the experiment results
        of 40 queries, the average accuracy is the study was 90%.
           In addition, Wang et al. (2019) proposed a novel AQE method by using the K-NN method, where they utilize
        the local WE and focusing on the semantic similarity between the words. The cosine similarity measure is
        utilized to calculate the similarity between two words. Based on the experimental results, they demonstrate that
        the proposed local embedding method in significant outperforming the baselines methods and its promising
        area for future work in AQE. Finally, a comparison between the conventional AQE approaches and the WE-
        based approaches is given  in the  next paragraph. First, the input and the output of the conventional  AQE
        approaches are terms, while the input of the WE-based approaches is terms and the output is vectors. Second,
        the complexity in the conventional AQE approaches depends on the nature of the language used, whereas the
        WE-based approaches deal with any Language easily because it relying on the vectors rather than on the text.
        Third, the terms in the conventional AQE approaches are described as terms, while the terms in the WE-based
        approaches  are  described  by  real-number  vectors  with  single  dimension.  Fourth,  the  conventional  AQE
        approaches use the main dataset corpus, whereas the WE-based approaches use the main dataset corpus in
        addition to the WE corpus they created during the training task.

        3.  Conclusion

           The conventional AQE approaches mainly rely on the assumption that each query term can select the best
        candidate terms based on their semantic closeness. The query semantics is analyzed locally, as prospect terms
        are chosen based on a one-word-at-a-time basis. However, this assumption is unable to represent the semantic
        of the query terms concerning the whole content of the query sentence.
           The WE is the group name for a series of language modelling and functional learning techniques in NLP,
        where terms or phrases from vocabulary are described by real-number vectors. It comprises a mathematical






        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [193]
        Artificial Intelligence in the 4th Industrial Revolution
   202   203   204   205   206   207   208   209   210   211   212