Page 207 - The-5th-MCAIT2021-eProceeding
P. 207
et al.,2011; Mikolov et al.,2013). One drawback of the previous methods is that they operate at word-level, so
that morphological rich words or vocabulary words can be modelled more closely. WE has been recently used
to address the vocabulary mismatch problem (Roy et al.,2016; El Mahdaouy et al.,2018; Fernández-Reyes et
al.,2018). WE are distributed representations techniques of the words commonly extracted from a neural
network that models the joint distribution of the corpus vocabulary. The embedding models are usually trained
in a broad corpus based on term proximity (Diaz et al.,2016).
Recently, most research in AQE relies on WE as a semantic modelling technique (ALMasri et al.,2016; Roy
et al.,2016). To leverage WE to improve AQE effectiveness, Roy et al. (2016) proposed three AQE methods
based on the WE technique. The AQE technique is devised using semantic relationships in a distribution of the
terms, where the candidate related terms have been obtained using the K-Nearest Neighbor (K-NN) approach.
Several studies were found to use the WE for AQE (Zamani & Croft,2016; Zamani & Croft,2017; El Mahdaouy
et al.,2018).
El Mahdaouy et al. (2019) proposed incorporating WE similarity into PRF models for AIR. The principal
idea is to select expansion terms in the PRF documents with their distribution and their similarity to the original
query terms. The study hypothesizes that WE can be used for AIR in the PRF framework, as similar words to
be grouped together to one side are close to each other in the vector space. The main aim is to increase the
weight and the original query terms in semantically related terms. Three neural WE models, including a Skip-
gram, Words CBOW Continuous Bag and Glove, are investigated in this work. Evaluations are carried out
using three neural WE models on the standard TREC 2001/2002 Arabic test collection. The aim of the study is
to understand how WE can be used in PRF techniques for AIR. Results showed that the PRF extensions
proposed exceed the PRF baseline models significantly. In addition, they increased by 22% the basic IR model
for MAP and 68% the robustness index. In addition, there was not statistically significant performance
difference between the three model WE models (Glove, CBOW and Skip-gram).
Besides, Maryamah et al. (2019) proposed an AQE method based on BabelNet using the WE technique
Word2Vec. BabelNet is a semantic search dictionary that combines knowledge of Wikipedia articles and
lexicographic from Wordnet (Navigli & Ponzetto,2012). WordNet is used to acquire the synsets based on lexical
or semantic relationships between terms, whereas uses the relationship between entities on the Wikipedia page.
The candidate expansion term is also obtained from WordNet and synonyms. Based on the experiment results
of 40 queries, the average accuracy is the study was 90%.
In addition, Wang et al. (2019) proposed a novel AQE method by using the K-NN method, where they utilize
the local WE and focusing on the semantic similarity between the words. The cosine similarity measure is
utilized to calculate the similarity between two words. Based on the experimental results, they demonstrate that
the proposed local embedding method in significant outperforming the baselines methods and its promising
area for future work in AQE. Finally, a comparison between the conventional AQE approaches and the WE-
based approaches is given in the next paragraph. First, the input and the output of the conventional AQE
approaches are terms, while the input of the WE-based approaches is terms and the output is vectors. Second,
the complexity in the conventional AQE approaches depends on the nature of the language used, whereas the
WE-based approaches deal with any Language easily because it relying on the vectors rather than on the text.
Third, the terms in the conventional AQE approaches are described as terms, while the terms in the WE-based
approaches are described by real-number vectors with single dimension. Fourth, the conventional AQE
approaches use the main dataset corpus, whereas the WE-based approaches use the main dataset corpus in
addition to the WE corpus they created during the training task.
3. Conclusion
The conventional AQE approaches mainly rely on the assumption that each query term can select the best
candidate terms based on their semantic closeness. The query semantics is analyzed locally, as prospect terms
are chosen based on a one-word-at-a-time basis. However, this assumption is unable to represent the semantic
of the query terms concerning the whole content of the query sentence.
The WE is the group name for a series of language modelling and functional learning techniques in NLP,
where terms or phrases from vocabulary are described by real-number vectors. It comprises a mathematical
E- Proceedings of The 5th International Multi-Conference on Artificial Intelligence Technology (MCAIT 2021) [193]
Artificial Intelligence in the 4th Industrial Revolution