H<sup>2</sup>FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection

Xu, Y; Sun, Y; Yang, Z; Miao, J; Yang, Y

H<sup>2</sup>FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection

Xu, Y Sun, Y Yang, Z Miao, J Yang, Y

Permalink

Publisher:: IEEE COMPUTER SOC
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June, pp. 14309-14319
Issue Date:: 2022-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (1.53 MB)

Adobe PDF

Download Accepted ManuscriptAdobe PDF (20.44 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, Y
dc.contributor.author	Sun, Y
dc.contributor.author	Yang, Z
dc.contributor.author	Miao, J
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date	2022-06-18
dc.date.accessioned	2023-01-16T05:04:14Z
dc.date.available	2023-01-16T05:04:14Z
dc.date.issued	2022-01-01
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June, pp. 14309-14319
dc.identifier.isbn	9781665469463
dc.identifier.issn	1063-6919
dc.identifier.uri	http://hdl.handle.net/10453/165021
dc.description.abstract	Cross-domain weakly supervised object detection (CD-WSOD) aims to adapt the detection model to a novel target domain with easily acquired image-level annotations. How to align the source and target domains is critical to the CDWSOD accuracy. Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hier-archical Feature Alignment (H2FA) R-CNN. H2FA R-CNN enforces two image-level alignments for the backbone features, as well as two instance-level alignments for the RPN and detection head. This coarse-to-fine aligning hierarchy is in pace with the detection pipeline, i.e., processing the image-level feature and the instance-level features from bottom to top. Importantly, we devise a novel hybrid supervision method for learning two instance-level align-ments. It enables the RPN and detection head to simultane-ously receive weak/full supervision from the target/source domains. Combining all these feature alignments, H2 FA R-CNN effectively mitigates the gap between the source and target domains. Experimental results show that H2 FA R-CNN significantly improves cross-domain object detection accuracy and sets new state of the art on popular benchmarks. Code and pre-trained models are available at https://github.com/XuYunqiu/H2FA_R-CNN.
dc.language	en
dc.publisher	IEEE COMPUTER SOC
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
dc.relation.ispartof	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition
dc.relation.isbasedon	10.1109/CVPR52688.2022.01393
dc.rights	info:eu-repo/semantics/openAccess
dc.title	H<sup>2</sup>FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object Detection
dc.type	Conference Proceeding
utslib.citation.volume	2022-June
utslib.location.activity	New Orleans, LA
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2023-01-16T05:04:00Z
pubs.finish-date	2022-06-24
pubs.publication-status	Published
pubs.start-date	2022-06-18
pubs.volume	2022-June

Abstract:

Cross-domain weakly supervised object detection (CD-WSOD) aims to adapt the detection model to a novel target domain with easily acquired image-level annotations. How to align the source and target domains is critical to the CDWSOD accuracy. Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hier-archical Feature Alignment (H2FA) R-CNN. H2FA R-CNN enforces two image-level alignments for the backbone features, as well as two instance-level alignments for the RPN and detection head. This coarse-to-fine aligning hierarchy is in pace with the detection pipeline, i.e., processing the image-level feature and the instance-level features from bottom to top. Importantly, we devise a novel hybrid supervision method for learning two instance-level align-ments. It enables the RPN and detection head to simultane-ously receive weak/full supervision from the target/source domains. Combining all these feature alignments, H2 FA R-CNN effectively mitigates the gap between the source and target domains. Experimental results show that H2 FA R-CNN significantly improves cross-domain object detection accuracy and sets new state of the art on popular benchmarks. Code and pre-trained models are available at https://github.com/XuYunqiu/H2FA_R-CNN.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/165021