Learning enhanced features and inferring twice for fine-grained image classification

Publisher:
SPRINGER
Publication Type:
Journal Article
Citation:
Multimedia Tools and Applications, 2023, 82, (10), pp. 14799-14813
Issue Date:
2023-04-01
Full metadata record
Fine-Grained Visual Categorization (FGVC) aims to distinguish between extremely similar subordinate-level categories within the same basic-level category. Existing research has proven the great importance of the discriminative features in FGVC but ignored the contributions for correct classification from other features, and the extracted features always contain more information about the obvious regions but less about subtle regions. In this paper, firstly, a novel module named forcing module is proposed to force the network to extract more diverse features for FGVC, which generates a suppression mask based on the class activation maps to suppress the most distinguishable regions, so as to force the network to extract other secondary distinguishable features as the final features. The forcing module consists of the original branch and the forcing branch. The original branch focuses on the primary discriminative regions while the forcing branch focuses on secondary discriminative regions. Secondly, in order to solve the problem that information of small-scale distinguishable features is lost seriously after multi-layer down-sampling, according to the class activation maps of the first prediction, the object is cropped and scaled as the second input. To reduce the prediction error, the first and second prediction probabilities are fused as the final prediction result. Experimental results indicate that the proposed method not only outperforms the baseline model by a large margin (3.7%, 5.9%, 3.1% respectively) on CUB-200-2011, Stanford-Cars, and FGVC-Aircraft, but also achieves state-of-the-art performance on FGVC-Aircraft.
Please use this identifier to cite or link to this item: