You are in:Home/Publications/Arabic Regional Dialect Identification (ARDI) using Pair of Continuous Bag-of-Words and Data Augmentation

Dr. Hamada Ali Mohamed Ali Nayel :: Publications:

Title:
Arabic Regional Dialect Identification (ARDI) using Pair of Continuous Bag-of-Words and Data Augmentation
Authors: Ahmed H AbuElAtta; Mahmoud Sobhy; Ahmed A El-Sawy; Hamada Nayel
Year: 2023
Keywords: Author Profiling; NLP; Machine Learning; Arabic NLP
Journal: International Journal of Advanced Computer Science and Applications
Volume: 14
Issue: 11
Pages: 258-264
Publisher: The Science and Information Organization
Local/International: International
Paper Link:
Full paper Hamada Ali Mohamed Ali Nayel_Arabic_Regional_Dialect_Identi.pdf
Supplementary materials Not Available
Abstract:

Author profiling is the process of finding characteristics that make up an author’s profile. This paper presents a machine learning-based author profiling model for Arabic users, considering the author’s regional dialect as a crucial characteristic. Various classification algorithms have been implemented: decision tree, KNN, multilayer perceptron, random forest, and support vector machines. A pair of Continuous Bag-of-Word (CBOW) models has been used for word representation. A well-known data set has been used to evaluate the proposed model and a data augmentation process has been implemented to improve the quality of training data. Support vector machines achieved a 50.52% f1-score, outperforming other models.

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus