You are in:Home/Publications/Arabic Semantic-Based Textual Similarity

Dr. Shimaa Ismail Mohamed Mustafa :: Publications:

Title:
Arabic Semantic-Based Textual Similarity
Authors: Shimaa Ismail, AbdelWahab Alsammak, Tarek Elshishtawy
Year: 2022
Keywords: Arabic Text Similarity, Semantic Similarity, Lexical Similarity, Word Embedding, Permutation Feature, Negation Effect.
Journal: Benha Journal of Applied Sciences (BJAS)
Volume: 7
Issue: 4
Pages: 133-142
Publisher: Benha Journal of Applied Sciences (BJAS)
Local/International: Local
Paper Link:
Full paper Shimaa Ismail Mohamed Mustafa_Shimaa_Ismail Semantic Similarity.pdf
Supplementary materials Not Available
Abstract:

Textual similarity is one of the most important aspects of information retrieval. This paper proposes several techniques of semantic textual similarity as well as the factors that influence them. Two-hybrid approaches for measuring the degree of similarity between two Arabic snipped texts are presented. The first proposed approach combined the word-based and vectorbased similarity methods to construct semantic word spaces for each word of the input text. These words are represented in their lemma forms to capture all semantically related words. In this approach, the semantic word spaces are used to find the best matching between the input text words, and hence, the degree of similarity between the two snipped texts is computed. The second proposed approach combined semantic and syntactic based approaches. The basic Levenshtein concept represents the main structure for this approach. It has been modified to measure the edit cost at the token level not at the character level. In addition, the semantic word spaces are added to this approach to include the semantic features to the syntactic features. Some techniques are embedded to overcome the syntactic approach problems such as the word sequence. Pearson correlation coefficient is used to measure the degree of correctness of the two proposed approaches as compared to two benchmark datasets. The experiments achieved 0.7212 and 0.7589 for the two proposed approaches on two different datasets.

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus