You are in:Home/Publications/Nadia H. Alsulami, Amani T. Jamal, and Lamiaa Elrefaei, "Deep Learning-Based Approach for Arabic Visual Speech Recognition", Computers, Materials & Continua, vol: 71, No.1, pp. 85-108, 2022. doi:10.32604/cmc.2022.019450

Prof. Lamiaa Abdallah Ahmed Elrefaei :: Publications:

Title:
Nadia H. Alsulami, Amani T. Jamal, and Lamiaa Elrefaei, "Deep Learning-Based Approach for Arabic Visual Speech Recognition", Computers, Materials & Continua, vol: 71, No.1, pp. 85-108, 2022. doi:10.32604/cmc.2022.019450
Authors: Nadia H. Alsulami, Amani T. Jamal, and Lamiaa Elrefaei
Year: 2022
Keywords: Not Available
Journal: Computers, Materials & Continua
Volume: 71
Issue: 1
Pages: 85-108
Publisher: TSP
Local/International: International
Paper Link:
Full paper Not Available
Supplementary materials Not Available
Abstract:

Lip-reading technologies are rapidly progressing following the breakthrough of deep learning. It plays a vital role in its many applications, such as: human-machine communication practices or security applications. In this paper, we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms. The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers. The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase. Firstly, we extract keyframes from our dataset. Secondly, we produce a Concatenated Frame Images (CFIs) that represent the utterance sequence in one single image. Finally, the VGG-19 is employed for visual features extraction in our proposed model. We have examined different keyframes: 10, 15, and 20 for comparing two types of approaches in the proposed model: (1) the VGG-19 base model and (2) VGG-19 base model with batch normalization. The results show that the second approach achieves greater accuracy: 94% for digit recognition, 97% for phrase recognition, and 93% for digits and phrases recognition in the test dataset. Therefore, our proposed model is superior to models based on CFIs input.

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus