Page 206 - The-5th-MCAIT2021-eProceeding

P. 206

Nowadays, AQE methods are recommended as an efficient method in addressing the query-document words
and terminology discrepancy problem in IR tasks (Vechtomova,2009; White & Horvitz,2015). The goal of AQE
is to enhance retrieval performance by adding some semantical words to the original query. AQE approaches
can be categorized as global or local approaches. Global approaches extend the original query independently
regardless of the outcome. WordNet is typically the standard exogenous tool to choose from to select new terms
semantically related to the original ones. (Pal et al.,2014). Local methods, by contrast, utilize approaches to
Relevance Feedback (RF). The results of a first retrieved documents are used to select the most promising terms
for the initial query. AQE can be categorized into two main groups: Conventional Query Expansion
Approaches, and Word Embedding-Based Approaches, each group are explained in details in the next sections.

2.1. Conventional Query Expansion Approaches

As stated by Cui et al. (2002), to assist the web users in developing better queries or requests, researchers
have concentrated on AQE approaches. In AQE, users provide extra input on query phrases or words by
proposing additional query terms or words. Web search engines, such as Google and Yahoo, give a query
suggestion to the users. Query suggestions are a common search experience that displays an updating list of
relevant queries that users can select from as they type. These suggestions help users find specific queries
guaranteed to have better results (Wang et al.,2009).
One of the conventional AQE techniques is to find the related words of the given query by using the thesaurus
to pick the synonyms for that word. WordNet considered as the most popular methods (Jiang & Conrath,1996;
Mandala et al.,1998). WordNet is a lexical database that groups words into a set of synonyms called synsets.
This technique expands the original query by analyzing the expansion features such as lexical, morphological,
semantic, and syntactic term relationships. Several studies were found to expand the query using the WordNet
(Mahgoub et al.,2014; Al-Chalabi et al.,2015; Abbache et al.,2016).
Another AQE technique is Relevance Feedback (RF), where it is one of the most effective technique used to
expand the users query, where the terms are extracted from the top retrieved documents. Pseudo Relevance
Feedback (PRF) is the most similar technique to RF (ALMasri et al.,2016; Singh & Sharan,2017). PRF
technique was proposed initially by (Croft & Harper,1979), where it assumes that the top retrieved documents
are relevant, then select from these documents related terms to add to the original query. Some studies are using
the PRF to expand the users queries (Atwan et al.,2016; El Mahdaouy et al.,2019).
One of the earliest AQE techniques is stemming. Stemming is the process of reducing the the inflected words
to their morphological root or word stem. It combines the words have one stem (assuming they have the same
meaning) to make them inder one index term. The stemming technique can be simple by removing pluralization
suffixes from words or complicated ways of preserving meanings and incorporating dictionaries (Farrar &
Hayes,2019). Few approaches that use stemming for AQE (Hammo et al.,2007; Khafajeh et al.,2010; Nwesri
& Alyagoubi,2015). Co-occurrence of the words is considered as one of the main ways that compute the
semantic relations between the words. The hypothesis is that, semantically similar words almost occur in the
same contexts (Z,1968; Lindén & Piitulainen,2004). Shaalan et al. (2012) was used the co-occurrence of words
for AQE.

2.2. Word Embedding-Based Approaches

Since the distributional hypothesis was proposed by (Harris,1954), large unlabeled text corpora have been
often used to build word representations. Low dimensional representations know as Word Embeddings (WE)
have recently resulted in low-dimensional representations, as the loss function usually using the algorithm of
Stochastic Gradient Descent, often in the form of a neural network, is minimized (Mikolov et al.,2013;
Pennington et al.,2014). These so-called WE have yielded state-of-the-art results in various NLP tasks such as
word similarity, analogy, PoS tagging, named-entity disambiguation, or IR tasks (Collobert et al.,2011; Socher

E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [192]
Artificial Intelligence in the 4th Industrial Revolution

201 202 203 204 205 206 207 208 209 210 211