| You are in:Home/Publications/Arabic Regional Dialect Identification (ARDI) using Pair of Continuous Bag-of-Words and Data Augmentation | |
Ass. Lect. Mahmoud Sobhy Ali Hassan :: Publications: |
|
| Title: | Arabic Regional Dialect Identification (ARDI) using
Pair of Continuous Bag-of-Words and Data
Augmentation |
| Authors: | Ahmed H. AbuElAtta, Mahmoud Sobhy, Ahmed A. El-Sawy, Hamada Nayel |
| Year: | 2025 |
| Keywords: | Dialect identification; continuous Bag-of-Words; data augmentation; text classification |
| Journal: | International Journal of Advanced Computer Science and Applications |
| Volume: | 14 |
| Issue: | Not Available |
| Pages: | Not Available |
| Publisher: | Not Available |
| Local/International: | International |
| Paper Link: | Not Available |
| Full paper | Mahmoud Sobhy Ali Hassan_Paper_25-Arabic_Regional_Dialect_Identification.pdf |
| Supplementary materials | Not Available |
| Abstract: |
Author profiling is the process of finding characteristics that make up an author’s profile. This paper presents a machine learning-based author profiling model for Arabic users, considering the author’s regional dialect as a crucial characteristic. Various classification algorithms have been implemented: decision tree, KNN, multilayer perceptron, random forest, and support vector machines. A pair of Continuous Bag-of-Word (CBOW) models has been used for word representation. A well-known data set has been used to evaluate the proposed model and a data augmentation process has been implemented to improve the quality of training data. Support vector machines achieved a 50.52% f1-score, outperforming other models. |














