Category attention transfer for efficient fine-grained visual categorization

Liao, Q; Wang, D; Xu, M

Category attention transfer for efficient fine-grained visual categorization

Liao, Q Wang, D Xu, M

Permalink

Publisher:: ELSEVIER
Publication Type:: Journal Article
Citation:: Pattern Recognition Letters, 2022, 153, pp. 10-15
Issue Date:: 2022-01-01

In Progress

	Filename	Description	Size
	Category attention transfer for efficient fine-grained visual categorization.pdf	Published version	1.04 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is being processed and is not currently available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liao, Q
dc.contributor.author	Wang, D
dc.contributor.author	Xu, M https://orcid.org/0000-0001-9581-8849
dc.date.accessioned	2023-03-20T23:41:47Z
dc.date.available	2023-03-20T23:41:47Z
dc.date.issued	2022-01-01
dc.identifier.citation	Pattern Recognition Letters, 2022, 153, pp. 10-15
dc.identifier.issn	0167-8655
dc.identifier.issn	1872-7344
dc.identifier.uri	http://hdl.handle.net/10453/167824
dc.description.abstract	Fine-Grained Visual Categorization (FGVC) aims at distinguishing subordinate-level categories with subtle interclass differences. Although previous research shows the impressive effectiveness of the recurrent multi-attention models and the second-order feature encoding, they often require an enormous amount of both computation and memory space, making them inadequate for mobile applications. This paper proposed a Category Attention Transfer CNN (CAT-CNN) to address the efficiency issue in solving FGVC problems. We transfer part attention knowledge from a very large-scale FGVC network to a small but efficient network to significantly improve its presentation ability. Using the proposed CAT-CNN, the accuracy of the efficient networks, such as ShuffleNet, MobilieNet, and EfficientNet, can be improved by up to 5.7% on the CUB-2011-200 dataset without increasing computation complexity or memory cost. Our experiments show that the proposed CAT-CNN can be applied to multiple structures to enhance their performance. With a single efficient network structure and single inference, the proposed CAT-MobileNet-large-1.0 and the CAT-EfficientNet-b0 can achieve accuracies of 86.5% and 86.7%, respectively, on the CUB-2011-200 dataset, which is close to or better than the results from state-of-the-art methods using large scale networks and multiple inferences, and make FGVC feasible on mobile devices.
dc.language	English
dc.publisher	ELSEVIER
dc.relation.ispartof	Pattern Recognition Letters
dc.relation.isbasedon	10.1016/j.patrec.2021.11.015
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.rights	This is an open access article under the CC BY-NC-ND license
dc.subject	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering, 1702 Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Category attention transfer for efficient fine-grained visual categorization
dc.type	Journal Article
utslib.citation.volume	153
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0906 Electrical and Electronic Engineering
utslib.for	1702 Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	in_progress	*
dc.date.updated	2023-03-20T23:41:45Z
pubs.publication-status	Published
pubs.volume	153

Abstract:

Fine-Grained Visual Categorization (FGVC) aims at distinguishing subordinate-level categories with subtle interclass differences. Although previous research shows the impressive effectiveness of the recurrent multi-attention models and the second-order feature encoding, they often require an enormous amount of both computation and memory space, making them inadequate for mobile applications. This paper proposed a Category Attention Transfer CNN (CAT-CNN) to address the efficiency issue in solving FGVC problems. We transfer part attention knowledge from a very large-scale FGVC network to a small but efficient network to significantly improve its presentation ability. Using the proposed CAT-CNN, the accuracy of the efficient networks, such as ShuffleNet, MobilieNet, and EfficientNet, can be improved by up to 5.7% on the CUB-2011-200 dataset without increasing computation complexity or memory cost. Our experiments show that the proposed CAT-CNN can be applied to multiple structures to enhance their performance. With a single efficient network structure and single inference, the proposed CAT-MobileNet-large-1.0 and the CAT-EfficientNet-b0 can achieve accuracies of 86.5% and 86.7%, respectively, on the CUB-2011-200 dataset, which is close to or better than the results from state-of-the-art methods using large scale networks and multiple inferences, and make FGVC feasible on mobile devices.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/167824