Page 206 - The-5th-MCAIT2021-eProceeding
P. 206

Nowadays, AQE methods are recommended as an efficient method in addressing the query-document words
        and terminology discrepancy problem in IR tasks (Vechtomova,2009; White & Horvitz,2015). The goal of AQE
        is to enhance retrieval performance by adding some semantical words to the original query. AQE approaches
        can be categorized as global or local approaches. Global approaches extend the original query independently
        regardless of the outcome. WordNet is typically the standard exogenous tool to choose from to select new terms
        semantically related to the original ones. (Pal et al.,2014). Local methods, by contrast, utilize approaches to
        Relevance Feedback (RF). The results of a first retrieved documents are used to select the most promising terms
        for  the  initial  query.  AQE  can  be  categorized  into  two  main  groups:  Conventional  Query  Expansion
        Approaches, and Word Embedding-Based Approaches, each group are explained in details in the next sections.

        2.1.  Conventional Query Expansion Approaches

           As stated by Cui et al. (2002), to assist the web users in developing better queries or requests, researchers
        have  concentrated  on  AQE  approaches.  In  AQE,  users  provide  extra  input  on  query  phrases  or  words  by
        proposing additional query terms or words. Web search engines, such as Google and Yahoo, give a query
        suggestion to the users. Query suggestions are a common search experience that displays an updating list of
        relevant queries that users can select from as they type. These suggestions help users find  specific queries
        guaranteed to have better results (Wang et al.,2009).
           One of the conventional AQE techniques is to find the related words of the given query by using the thesaurus
        to pick the synonyms for that word. WordNet considered as the most popular methods (Jiang & Conrath,1996;
        Mandala et al.,1998). WordNet is a lexical database that groups words into a set of synonyms called synsets.
        This technique expands the original query by analyzing the expansion features such as lexical, morphological,
        semantic, and syntactic term relationships. Several studies were found to expand the query using the WordNet
        (Mahgoub et al.,2014; Al-Chalabi et al.,2015; Abbache et al.,2016).
           Another AQE technique is Relevance Feedback (RF), where it is one of the most effective technique used to
        expand the users query, where the terms are extracted from the top retrieved documents. Pseudo Relevance
        Feedback  (PRF)  is  the  most  similar  technique  to  RF  (ALMasri  et  al.,2016;  Singh  &  Sharan,2017).  PRF
        technique was proposed initially by (Croft & Harper,1979), where it assumes that the top retrieved documents
        are relevant, then select from these documents related terms to add to the original query. Some studies are using
        the PRF to expand the users queries (Atwan et al.,2016; El Mahdaouy et al.,2019).
           One of the earliest AQE techniques is stemming. Stemming is the process of reducing the the inflected words
        to their morphological root or word stem. It combines the words have one stem (assuming they have the same
        meaning) to make them inder one index term. The stemming technique can be simple by removing pluralization
        suffixes from  words or complicated ways of preserving meanings and incorporating dictionaries (Farrar &
        Hayes,2019). Few approaches that use stemming for AQE (Hammo et al.,2007; Khafajeh et al.,2010; Nwesri
        &  Alyagoubi,2015).  Co-occurrence  of  the  words  is  considered  as one  of  the  main  ways  that  compute  the
        semantic relations between the words. The hypothesis is that, semantically similar words almost occur in the
        same contexts (Z,1968; Lindén & Piitulainen,2004). Shaalan et al. (2012) was used the co-occurrence of words
        for AQE.

        2.2.  Word Embedding-Based Approaches

           Since the distributional hypothesis was proposed by (Harris,1954), large unlabeled text corpora have been
        often used to build word representations. Low dimensional representations know as Word Embeddings (WE)
        have recently resulted in low-dimensional representations, as the loss function usually using the algorithm of
        Stochastic  Gradient  Descent,  often  in  the  form  of  a  neural  network,  is  minimized  (Mikolov  et  al.,2013;
        Pennington et al.,2014). These so-called WE have yielded state-of-the-art results in various NLP tasks such as
        word similarity, analogy, PoS tagging, named-entity disambiguation, or IR tasks (Collobert et al.,2011; Socher







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [192]
        Artificial Intelligence in the 4th Industrial Revolution
   201   202   203   204   205   206   207   208   209   210   211