Visual Semantic-Based Representation Learning Using Deep CNNs for Scene Recognition
Article Ecrit par: Gupta, Shikha ; Sharma, Krishan ; Aroor Dinesh, Dileep ; Thenkanidiyoor, Veena ;
Résumé: In this work, we address the task of scene recognition from image data. A scene is a spatially correlated arrangement of various visual semantic contents also known as concepts, e.g., "chair," "car," "sky," etc. Representation learning using visual semantic content can be regarded as one of the most trivial ideas as it mimics the human behavior of perceiving visual information. Semantic multinomial (SMN) representation is one such representation that captures semantic information using posterior probabilities of concepts. The core part of obtaining SMN representation is the building of concept models. Therefore, it is necessary to have ground-truth (true) concept labels for every concept present in an image. Moreover, manual labeling of concepts is practically not feasible due to the large number of images in the dataset. To address this issue, we propose an approach for generating pseudo-concepts in the absence of true concept labels. We utilize the pre-trained deep CNN-based architectures where activation maps (filter responses) from convolutional layers are considered as initial cues to the pseudo-concepts. The non-significant activation maps are removed using the proposed filter-specific threshold-based approach that leads to the removal of non-prominent concepts from data. Further, we propose a grouping mechanism to group the same pseudo-concepts using subspace modeling of filter responses to achieve a non-redundant representation. Experimental studies show that generated SMN representation using pseudo-concepts achieves comparable results for scene recognition tasks on standard datasets like MIT-67 and SUN-397 even in the absence of true concept labels.
Langue:
Anglais