Page 205 - The-5th-MCAIT2021-eProceeding
P. 205

Towards on Comparing Conventional Query Expansion
                Approaches and Word Embedding-Based Approaches


                                                                                        c
                                                                     b
                                    a
                 Yasir Hadi Farhan *, Shahrul Azman Mohd Noah , Masnizah Mohd
                 a,b,c  Faculty of Information Science and Technology, Uninvirsiti Kebangsaan Malaysia, Bangi 43000, Malaysia
                                          *Email: yasir.hadi87@yahoo.com

        Abstract

        Automatic Query Expansion (AQE) is a popular way that is used to address the problem of term mismatch between the
        user’s query terms and the relevant documents in the corpus. Term mismatch occurs when the user presents his/her query
        but it does not  match the contents present  in the existing documents.  This can severely  affect  the search-based tasks.
        Researches  have  been  proposed  several  approaches  to  solve  this  issue,  such  as  involving  new  related  terms  or  using
        synonyms  of  the  given  query.  The  proposed  approaches  can  be  divided  into  two  main  groups:  Conventional  Query
        Expansion  Approaches,  and  Word  Embedding-Based  Approaches.  Conventional  approaches  such  as  linguistic  and
        ontology-based approaches have been proposed to address the vocabulary mismatch problem. Word Embedding (WE) has
        been recently used to address the problems mentioned above as exhibited by the term mismatch. Word2Vec is a WE toolkit
        that transfers the words existing in the vocabulary to vectors of the actual numbers. In this paper, we have reviewed and
        summarized the Conventional approaches and the approaches based on the Word Embedding for AQE.

        Keywords: Automatic Query Expansion, Information Retrieval, Word Embedding;


        1.  Introduction

           Web Searching is considered as one of the most prominent and valuable services on the Internet. It has many
        web documents that attract the users to get the useful information they are looking for through the search engine.
        The main purpose of an Information Retrieval (IR) structure is to retrieve documents that are most relevant to
        the user's query, and the most relevant documents are the best IR systems than those which are less relevant.
        The documents are ranked in terms of the query and terms of the retrieved documents (El,2020). The user must
        usually formulate the information requirements via a query; then the IR system returns the information to the
        user (Baeza-Yates & Ribeiro-Neto,1999). During interactions with users, IR systems face many challenges, one
        of which is vocabulary, also called vocabulary mismatch (Carpineto & Romano,2012; Farhan Yasir et al.,2020).
           Several efforts have been made within the research community to improve the effectiveness of IR systems,
        including the use of relevance feedback and query refinement. Researchers in the field of IR have suggested
        many solutions to address this issue, the latest being the AQE. This technique is aimed at rephrasing the original
        query by adding new terms to improve the accuracy of the IR system (Abbache et al.,2016). Some of the
        proposed approaches rely on the feedback given by the users to expand their queries by inserting new related
        terms into the original query or suggesting appropriate keywords (Raza et al.,2018).

        2.  Automatic Query Expansion

           Automatic Query Expansion (AQE) is reformulating the user's query by automatically adding additional
        relevant terms to the original query to improve retrieval performance. The fundamental issue of the retrieval
        process  is  the  vocabulary  mismatch  between  the  query  terms  and  the  documents.  Recently,  several  AQE
        approaches, such as linguistic and ontology-based approaches, have been proposed to address the vocabulary
        mismatch problem (Raza et al.,2019).







        E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021)   [191]
        Artificial Intelligence in the 4th Industrial Revolution
   200   201   202   203   204   205   206   207   208   209   210