Discrete Fusion Adversarial Hashing for cross-modal retrieval

Li, J; Yu, E; Ma, J; Chang, X; Zhang, H; Sun, J

Discrete Fusion Adversarial Hashing for cross-modal retrieval

Li, J Yu, E Ma, J Chang, X

Zhang, H Sun, J

Permalink

Publisher:: ELSEVIER
Publication Type:: Journal Article
Citation:: Knowledge-Based Systems, 2022, 253
Issue Date:: 2022-10-11

Closed Access

	Filename	Description	Size
	Discrete Fusion Adversarial Hashing for cross-modal retrieval.pdf	Published version	1.12 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, J
dc.contributor.author	Yu, E
dc.contributor.author	Ma, J
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807
dc.contributor.author	Zhang, H
dc.contributor.author	Sun, J
dc.date.accessioned	2023-03-22T01:48:28Z
dc.date.available	2023-03-22T01:48:28Z
dc.date.issued	2022-10-11
dc.identifier.citation	Knowledge-Based Systems, 2022, 253
dc.identifier.issn	0950-7051
dc.identifier.issn	1872-7409
dc.identifier.uri	http://hdl.handle.net/10453/168038
dc.description.abstract	Deep cross-modal hashing enables a flexible and efficient way for large-scale cross-modal retrieval. Existing cross-modal retrieval methods based on deep hashing aim to learn the unified hashing representation for different modalities with the supervision of pair-wise correlation, and then encode the out-of-samples via modality-specific hashing network. However, the semantic gap and distribution shift were not considered enough, and the hashing codes cannot be unified as expected under different modalities. At the same time, hashing is still a discrete problem that has not been solved well in the deep neural network. Therefore, we propose the Discrete Fusion Adversarial Hashing (DFAH) network for cross-modal retrieval to address these issues. In DFAH, the Modality-Specific Feature Extractor is designed to capture image and text features with pair-wise supervision. Especially, the Fusion Learner is proposed to learn the unified hash code, which enhances the correlation of heterogeneous modalities via the embedding strategy. Meanwhile, the Modality Discriminator is designed to adapt to the distribution shift cooperating with the Modality-Specific Feature Extractor in an adversarial way. In addition, we design an efficient discrete optimization strategy to avoid the relaxing quantization errors in the deep neural framework. Finally, the experiment results and analysis on several popular datasets also show that DFAH outperforms the state-of-the-art methods for cross-modal retrieval.
dc.language	English
dc.publisher	ELSEVIER
dc.relation.ispartof	Knowledge-Based Systems
dc.relation.isbasedon	10.1016/j.knosys.2022.109503
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 15 Commerce, Management, Tourism and Services, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Discrete Fusion Adversarial Hashing for cross-modal retrieval
dc.type	Journal Article
utslib.citation.volume	253
utslib.for	08 Information and Computing Sciences
utslib.for	15 Commerce, Management, Tourism and Services
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2023-03-22T01:48:27Z
pubs.publication-status	Published
pubs.volume	253

Abstract:

Deep cross-modal hashing enables a flexible and efficient way for large-scale cross-modal retrieval. Existing cross-modal retrieval methods based on deep hashing aim to learn the unified hashing representation for different modalities with the supervision of pair-wise correlation, and then encode the out-of-samples via modality-specific hashing network. However, the semantic gap and distribution shift were not considered enough, and the hashing codes cannot be unified as expected under different modalities. At the same time, hashing is still a discrete problem that has not been solved well in the deep neural network. Therefore, we propose the Discrete Fusion Adversarial Hashing (DFAH) network for cross-modal retrieval to address these issues. In DFAH, the Modality-Specific Feature Extractor is designed to capture image and text features with pair-wise supervision. Especially, the Fusion Learner is proposed to learn the unified hash code, which enhances the correlation of heterogeneous modalities via the embedding strategy. Meanwhile, the Modality Discriminator is designed to adapt to the distribution shift cooperating with the Modality-Specific Feature Extractor in an adversarial way. In addition, we design an efficient discrete optimization strategy to avoid the relaxing quantization errors in the deep neural framework. Finally, the experiment results and analysis on several popular datasets also show that DFAH outperforms the state-of-the-art methods for cross-modal retrieval.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/168038