You are in:Home/Publications/Arabic Regional Dialect Identification (ARDI) using Pair of Continuous Bag-of-Words and Data Augmentation

Ass. Lect. Mahmoud Sobhy Ali Hassan :: Publications:

Title:
Arabic Regional Dialect Identification (ARDI) using Pair of Continuous Bag-of-Words and Data Augmentation
Authors: Ahmed H. AbuElAtta, Mahmoud Sobhy, Ahmed A. El-Sawy, Hamada Nayel
Year: 2025
Keywords: Dialect identification; continuous Bag-of-Words; data augmentation; text classification
Journal: International Journal of Advanced Computer Science and Applications
Volume: 14
Issue: Not Available
Pages: Not Available
Publisher: Not Available
Local/International: International
Paper Link: Not Available
Full paper Mahmoud Sobhy Ali Hassan_Paper_25-Arabic_Regional_Dialect_Identification.pdf
Supplementary materials Not Available
Abstract:

Author profiling is the process of finding characteristics that make up an author’s profile. This paper presents a machine learning-based author profiling model for Arabic users, considering the author’s regional dialect as a crucial characteristic. Various classification algorithms have been implemented: decision tree, KNN, multilayer perceptron, random forest, and support vector machines. A pair of Continuous Bag-of-Word (CBOW) models has been used for word representation. A well-known data set has been used to evaluate the proposed model and a data augmentation process has been implemented to improve the quality of training data. Support vector machines achieved a 50.52% f1-score, outperforming other models.

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus