Masked Cross-image Encoding for Few-shot Segmentation

Xu, W; Huang, H; Cheng, M; Yu, L; Wu, Q; Zhang, J

Masked Cross-image Encoding for Few-shot Segmentation

Xu, W Huang, H Cheng, M Yu, L

Wu, Q

Zhang, J

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2023 IEEE International Conference on Multimedia and Expo (ICME), 2023, 2023-July, pp. 744-749
Issue Date:: 2023-01-01

Closed Access

	Filename	Description	Size
	Masked_Cross-image_Encoding_for_Few-shot_Segmentation.pdf	Published version	1.68 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, W
dc.contributor.author	Huang, H
dc.contributor.author	Cheng, M
dc.contributor.author	Yu, L https://orcid.org/0000-0001-5260-885X
dc.contributor.author	Wu, Q https://orcid.org/0000-0001-5641-2483
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541
dc.date	2023-07-10
dc.date.accessioned	2024-01-29T04:19:55Z
dc.date.available	2024-01-29T04:19:55Z
dc.date.issued	2023-01-01
dc.identifier.citation	2023 IEEE International Conference on Multimedia and Expo (ICME), 2023, 2023-July, pp. 744-749
dc.identifier.isbn	9781665468916
dc.identifier.issn	1945-7871
dc.identifier.issn	1945-788X
dc.identifier.uri	http://hdl.handle.net/10453/175006
dc.description.abstract	Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images. The key challenge in FSS is to classify the labels of query pixels using class prototypes learned from the few labeled support exemplars. Prior approaches to FSS have typically focused on learning class-wise descriptors independently from support images, thereby ignoring the rich contextual information and mutual dependencies among support-query features. To address this limitation, we propose a joint learning method termed Masked Cross-Image Encoding (MCE), which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction. MCE is more than a visual representation enrichment module; it also considers cross-image mutual dependencies and implicit guidance. Experiments on FSS benchmarks PASCAL-5i and COCO-20i demonstrate the advanced meta-learning ability of the proposed method.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2023 IEEE International Conference on Multimedia and Expo (ICME)
dc.relation.ispartof	IEEE International Conference on Multimedia and Expo (ICME)
dc.relation.ispartofseries	IEEE International Conference on Multimedia and Expo
dc.relation.isbasedon	10.1109/icme55011.2023.00133
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Masked Cross-image Encoding for Few-shot Segmentation
dc.type	Conference Proceeding
utslib.citation.volume	2023-July
utslib.location.activity	AUSTRALIA, Brisbane
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
pubs.organisational-group	University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2024-01-29T04:19:53Z
pubs.finish-date	2023-07-14
pubs.publication-status	Published
pubs.start-date	2023-07-10
pubs.volume	2023-July

Abstract:

Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images. The key challenge in FSS is to classify the labels of query pixels using class prototypes learned from the few labeled support exemplars. Prior approaches to FSS have typically focused on learning class-wise descriptors independently from support images, thereby ignoring the rich contextual information and mutual dependencies among support-query features. To address this limitation, we propose a joint learning method termed Masked Cross-Image Encoding (MCE), which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction. MCE is more than a visual representation enrichment module; it also considers cross-image mutual dependencies and implicit guidance. Experiments on FSS benchmarks PASCAL-5i and COCO-20i demonstrate the advanced meta-learning ability of the proposed method.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/175006