Effective few-shot learning approaches for image semantic segmentation

Xu, Wenbo

Effective few-shot learning approaches for image semantic segmentation

Xu, Wenbo

Permalink

Publication Type:: Thesis
Issue Date:: 2024

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (15.2 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, Wenbo
dc.date.accessioned	2025-04-07T03:19:12Z
dc.date.available	2025-04-07T03:19:12Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/10453/186727
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Semantic image segmentation has gained significant attention in computer vision due to its wide range of applications, including visual understanding, medical image analysis, self-driving vehicles, augmented reality, and video surveillance. While modern deep learning models have achieved surprising performance on segmentation tasks, it relies heavily on a massive amount of dense-labelled training data. However, abundant high-quality labeled data are not always available in real-world scenarios due to privacy or ethical concerns and safety issues. This research aims to reduce the reliance on data volume of segmentation tasks by introducing few-shot learning (FSL) technology. This empowers deep learning models to accurately segment unseen classes from only a few labeled images, thereby relieving researchers and engineers from intensive data labeling works. This research initially addresses the problem of few-shot semantic segmentation (FSS), which requires segmenting the novel class objects in a test image on the condition of a few labeled data. For the challenges of prototype bias and sub-optimal feature representation, this research proposes the Masked Cross-image Encoding technique. This method captures shared information and mutual dependencies between training data and testing data, enhancing the visual properties of novel classes for improved prototype-feature matching. Then, we re-evaluate the standard binary matching paradigm employed in FSS and identify its association with potential false-matching and under-matching issues, which can significantly degrade segmentation performance. To alleviate this issue, a Multi-Prototype Discrimination scheme is introduced to explicitly assign each pixel-wise query features to a specific class, reducing class matching ambiguity present in conventional FSS methods. Building upon the FSS task, we tackle a more practical and challenging task known as Incremental Few-Shot Semantic Segmentation (iFSS). It requires a deep learning model to continuously learn new classes with scarce annotated examples, while retaining the knowledge learned from previously encountered classes. We consider a meta-learning-based approach that simulates the incremental learning evaluation protocol during the base training stage. This training task alignment strategy encourages the model to learn how to incrementally adapt to novel classes without forgetting previous ones. The overall research contributes valuable insights and methodologies to enhance the effectiveness of few-shot learning approaches for semantic image segmentation.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/186727/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2024 Wenbo Xu
dc.rights	au.edu.uts.lib/cph
dc.title	Effective few-shot learning approaches for image semantic segmentation	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Semantic image segmentation has gained significant attention in computer vision due to its wide range of applications, including visual understanding, medical image analysis, self-driving vehicles, augmented reality, and video surveillance. While modern deep learning models have achieved surprising performance on segmentation tasks, it relies heavily on a massive amount of dense-labelled training data. However, abundant high-quality labeled data are not always available in real-world scenarios due to privacy or ethical concerns and safety issues. This research aims to reduce the reliance on data volume of segmentation tasks by introducing few-shot learning (FSL) technology. This empowers deep learning models to accurately segment unseen classes from only a few labeled images, thereby relieving researchers and engineers from intensive data labeling works. This research initially addresses the problem of few-shot semantic segmentation (FSS), which requires segmenting the novel class objects in a test image on the condition of a few labeled data. For the challenges of prototype bias and sub-optimal feature representation, this research proposes the Masked Cross-image Encoding technique. This method captures shared information and mutual dependencies between training data and testing data, enhancing the visual properties of novel classes for improved prototype-feature matching. Then, we re-evaluate the standard binary matching paradigm employed in FSS and identify its association with potential false-matching and under-matching issues, which can significantly degrade segmentation performance. To alleviate this issue, a Multi-Prototype Discrimination scheme is introduced to explicitly assign each pixel-wise query features to a specific class, reducing class matching ambiguity present in conventional FSS methods. Building upon the FSS task, we tackle a more practical and challenging task known as Incremental Few-Shot Semantic Segmentation (iFSS). It requires a deep learning model to continuously learn new classes with scarce annotated examples, while retaining the knowledge learned from previously encountered classes. We consider a meta-learning-based approach that simulates the incremental learning evaluation protocol during the base training stage. This training task alignment strategy encourages the model to learn how to incrementally adapt to novel classes without forgetting previous ones. The overall research contributes valuable insights and methodologies to enhance the effectiveness of few-shot learning approaches for semantic image segmentation.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/186727