Page 205 - The-5th-MCAIT2021-eProceeding
P. 205
Towards on Comparing Conventional Query Expansion
Approaches and Word Embedding-Based Approaches
c
b
a
Yasir Hadi Farhan *, Shahrul Azman Mohd Noah , Masnizah Mohd
a,b,c Faculty of Information Science and Technology, Uninvirsiti Kebangsaan Malaysia, Bangi 43000, Malaysia
*Email: yasir.hadi87@yahoo.com
Abstract
Automatic Query Expansion (AQE) is a popular way that is used to address the problem of term mismatch between the
user’s query terms and the relevant documents in the corpus. Term mismatch occurs when the user presents his/her query
but it does not match the contents present in the existing documents. This can severely affect the search-based tasks.
Researches have been proposed several approaches to solve this issue, such as involving new related terms or using
synonyms of the given query. The proposed approaches can be divided into two main groups: Conventional Query
Expansion Approaches, and Word Embedding-Based Approaches. Conventional approaches such as linguistic and
ontology-based approaches have been proposed to address the vocabulary mismatch problem. Word Embedding (WE) has
been recently used to address the problems mentioned above as exhibited by the term mismatch. Word2Vec is a WE toolkit
that transfers the words existing in the vocabulary to vectors of the actual numbers. In this paper, we have reviewed and
summarized the Conventional approaches and the approaches based on the Word Embedding for AQE.
Keywords: Automatic Query Expansion, Information Retrieval, Word Embedding;
1. Introduction
Web Searching is considered as one of the most prominent and valuable services on the Internet. It has many
web documents that attract the users to get the useful information they are looking for through the search engine.
The main purpose of an Information Retrieval (IR) structure is to retrieve documents that are most relevant to
the user's query, and the most relevant documents are the best IR systems than those which are less relevant.
The documents are ranked in terms of the query and terms of the retrieved documents (El,2020). The user must
usually formulate the information requirements via a query; then the IR system returns the information to the
user (Baeza-Yates & Ribeiro-Neto,1999). During interactions with users, IR systems face many challenges, one
of which is vocabulary, also called vocabulary mismatch (Carpineto & Romano,2012; Farhan Yasir et al.,2020).
Several efforts have been made within the research community to improve the effectiveness of IR systems,
including the use of relevance feedback and query refinement. Researchers in the field of IR have suggested
many solutions to address this issue, the latest being the AQE. This technique is aimed at rephrasing the original
query by adding new terms to improve the accuracy of the IR system (Abbache et al.,2016). Some of the
proposed approaches rely on the feedback given by the users to expand their queries by inserting new related
terms into the original query or suggesting appropriate keywords (Raza et al.,2018).
2. Automatic Query Expansion
Automatic Query Expansion (AQE) is reformulating the user's query by automatically adding additional
relevant terms to the original query to improve retrieval performance. The fundamental issue of the retrieval
process is the vocabulary mismatch between the query terms and the documents. Recently, several AQE
approaches, such as linguistic and ontology-based approaches, have been proposed to address the vocabulary
mismatch problem (Raza et al.,2019).
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [191]
Artificial Intelligence in the 4th Industrial Revolution