The automatic classification and retrieval of images is a challenging task, especially when dealing with low-quality and faded inks images, such as the historical manuscripts. Therefore, in this study we develop a reinforcement learning agent that is capable of interacting with an environment including historical Arabic manuscript images and retrieve the most similar images to a query image. First, the deep visual features of the images are extracted utilizing the pre-trained VGG19 convolutional neural network. Then, the associated deep textual features of the images are also extracted utilizing the attentional BiLSTM deep learning model. Both features are fused using the concatenation merge layer and hashed to reduce the dimensionality among the fused feature vectors for better image classification and retrieval. The proposed method tested on a manually collected dataset and recorded a promising high accuracy proved that the computer vision could be better performing than the humans’ vision. |