Developing remote sensing technology enables the production of very high-resolution (VHR) images. Classification of the VHR imagery scene has become a challenging problem. In this paper, we propose a model for VHR scene classification. First, convolutional neural networks (CNNs) with pre-trained weights are used as a deep feature extractor to extract the global and local CNNs features from the original VHR images. Second, the spectral residual-based saliency detection algorithm is used to extract the saliency map. Then, saliency features from the saliency map are extracted using CNNs in order to extract robust features for the VHR imagery, especially for the image with salience object. Third, we use the feature fusion technique rather than the raw deep features to represent the final shape of the VHR image scenes. In feature fusion, discriminant correlation analysis (DCA) is used to fuse both the global and local CNNs features and saliency features. DCA is a more suitable and cost-effective fusion method than the traditional fusion techniques. Finally, we propose an enhanced multilayer perceptron to classify the image. Experiments are performed on four widely used datasets: UC-Merced, WHU-RS, Aerial Image, and NWPU-RESISC45. Results confirm that the proposed model performs better than state-of-the-art scene classification models.
|