You are in:Home/Publications/Arabic Regional Dialect Identification (ARDI) using Pair of Continuous Bag-of-Words and Data Augmentation | |
Dr. Hamada Ali Mohamed Ali Nayel :: Publications: |
Title: | Arabic Regional Dialect Identification (ARDI) using Pair of Continuous Bag-of-Words and Data Augmentation |
Authors: | Ahmed H AbuElAtta; Mahmoud Sobhy; Ahmed A El-Sawy; Hamada Nayel |
Year: | 2023 |
Keywords: | Author Profiling; NLP; Machine Learning; Arabic NLP |
Journal: | International Journal of Advanced Computer Science and Applications |
Volume: | 14 |
Issue: | 11 |
Pages: | 258-264 |
Publisher: | The Science and Information Organization |
Local/International: | International |
Paper Link: | |
Full paper | Hamada Ali Mohamed Ali Nayel_Arabic_Regional_Dialect_Identi.pdf |
Supplementary materials | Not Available |
Abstract: |
Author profiling is the process of finding characteristics that make up an author’s profile. This paper presents a machine learning-based author profiling model for Arabic users, considering the author’s regional dialect as a crucial characteristic. Various classification algorithms have been implemented: decision tree, KNN, multilayer perceptron, random forest, and support vector machines. A pair of Continuous Bag-of-Word (CBOW) models has been used for word representation. A well-known data set has been used to evaluate the proposed model and a data augmentation process has been implemented to improve the quality of training data. Support vector machines achieved a 50.52% f1-score, outperforming other models. |