K-Reciprocal Harmonious Attention Network for Video-Based Person Re-Identification

Su, X; Qu, X; Zou, Z; Zhou, P; Wei, W; Wen, S; Hu, M

K-Reciprocal Harmonious Attention Network for Video-Based Person Re-Identification

Su, X Qu, X Zou, Z Zhou, P Wei, W Wen, S

Hu, M

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Access, 2019, 7, pp. 22457-22470
Issue Date:: 2019-01-01

Recently Added

	Filename	Description	Size
	PrePrint_WomansSpecialEnemy.pdf	PrePrint	269.38 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is new to OPUS and is not currently available.

Full metadata record

Field	Value	Language
dc.contributor.author	Su, X
dc.contributor.author	Qu, X
dc.contributor.author	Zou, Z
dc.contributor.author	Zhou, P
dc.contributor.author	Wei, W
dc.contributor.author	Wen, S https://orcid.org/0000-0001-8077-7001
dc.contributor.author	Hu, M
dc.date.accessioned	2022-07-19T01:09:39Z
dc.date.available	2022-07-19T01:09:39Z
dc.date.issued	2019-01-01
dc.identifier.citation	IEEE Access, 2019, 7, pp. 22457-22470
dc.identifier.issn	2169-3536
dc.identifier.issn	2169-3536
dc.identifier.uri	http://hdl.handle.net/10453/159018
dc.description.abstract	Video-based person re-identification aims to retrieve video sequences of the same person in the multi-camera system. In this paper, we propose a k -reciprocal harmonious attention network (KHAN) to jointly learn discriminative spatiotemporal features and the similarity metrics. In KHAN, the harmonious attention module adaptively calibrates response at each spatial position and each channel by explicitly inspecting position-wise and channel-wise interactions over feature maps. Besides, the k -reciprocal attention module attends key features from all frame-level features with a discriminative feature selection algorithm; thus, useful temporal information within contextualized key features can be assimilated to produce more robust clip-level representation. Compared with commonly used local-context based approaches, the proposed KHAN captures long dependency of different spatial regions and visual patterns while incorporating informative context at each time-step in a non-parametric manner. The extensive experiments on three public benchmark datasets show that the performance of our proposed approach outperforms the state-of-the-art methods.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	IEEE Access
dc.relation.isbasedon	10.1109/ACCESS.2019.2898269
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 10 Technology
dc.title	K-Reciprocal Harmonious Attention Network for Video-Based Person Re-Identification
dc.type	Journal Article
utslib.citation.volume	7
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	10 Technology
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	recently_added	*
dc.date.updated	2022-07-19T01:09:01Z
pubs.publication-status	Published
pubs.volume	7

Abstract:

Video-based person re-identification aims to retrieve video sequences of the same person in the multi-camera system. In this paper, we propose a k -reciprocal harmonious attention network (KHAN) to jointly learn discriminative spatiotemporal features and the similarity metrics. In KHAN, the harmonious attention module adaptively calibrates response at each spatial position and each channel by explicitly inspecting position-wise and channel-wise interactions over feature maps. Besides, the k -reciprocal attention module attends key features from all frame-level features with a discriminative feature selection algorithm; thus, useful temporal information within contextualized key features can be assimilated to produce more robust clip-level representation. Compared with commonly used local-context based approaches, the proposed KHAN captures long dependency of different spatial regions and visual patterns while incorporating informative context at each time-step in a non-parametric manner. The extensive experiments on three public benchmark datasets show that the performance of our proposed approach outperforms the state-of-the-art methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/159018