You are in:Home/Publications/On the Co-Selection of Vision Transformer Features and Images for Very High-Resolution Image Scene Classification

Dr. Ahmed Hagag :: Publications:

Title:
On the Co-Selection of Vision Transformer Features and Images for Very High-Resolution Image Scene Classification
Authors: Souleyman Chaib, Dou El Kefel Mansouri, Ibrahim Omara, Ahmed Hagag, Sahraoui Dhelim, Djamel Amar Bensaber
Year: 2022
Keywords: Not Available
Journal: Remote Sensing
Volume: 14
Issue: 22
Pages: 5817
Publisher: MDPI
Local/International: International
Paper Link:
Full paper Ahmed Hagag_remotesensing-14-05817.pdf
Supplementary materials Not Available
Abstract:

Recent developments in remote sensing technology have allowed us to observe the Earth with very high-resolution (VHR) images. VHR imagery scene classification is a challenging problem in the field of remote sensing. Vision transformer (ViT) models have achieved breakthrough results in image recognition tasks. However, transformer–encoder layers encode different levels of features, where the latest layer represents semantic information, in contrast to the earliest layers, which contain more detailed data but ignore the semantic information of an image scene. In this paper, a new deep framework is proposed for VHR scene understanding by exploring the strengths of ViT features in a simple and effective way. First, pre-trained ViT models are used to extract informative features from the original VHR image scene, where the transformer–encoder layers are used to generate the feature descriptors of the input images. Second, we merged the obtained features as one signal data set. Third, some extracted ViT features do not describe well the image scenes, such as agriculture, meadows, and beaches, which could negatively affect the performance of the classification model. To deal with this challenge, we propose a new algorithm for feature- and image selection. Indeed, this gives us the possibility of eliminating the less important features and images, as well as those that are abnormal; based on the similarity of preserving the whole data set, we selected the most informative features and important images by dropping the irrelevant images that degraded the classification accuracy. The proposed method was tested on three VHR benchmarks. The experimental results demonstrate that the proposed method outperforms other state-of-the-art methods.

Google ScholarAcdemia.eduResearch GateLinkedinFacebookTwitterGoogle PlusYoutubeWordpressInstagramMendeleyZoteroEvernoteORCIDScopus