Effective few-shot learning approaches for image semantic segmentation

Publication Type:
Thesis
Issue Date:
2024
Full metadata record
Semantic image segmentation has gained significant attention in computer vision due to its wide range of applications, including visual understanding, medical image analysis, self-driving vehicles, augmented reality, and video surveillance. While modern deep learning models have achieved surprising performance on segmentation tasks, it relies heavily on a massive amount of dense-labelled training data. However, abundant high-quality labeled data are not always available in real-world scenarios due to privacy or ethical concerns and safety issues. This research aims to reduce the reliance on data volume of segmentation tasks by introducing few-shot learning (FSL) technology. This empowers deep learning models to accurately segment unseen classes from only a few labeled images, thereby relieving researchers and engineers from intensive data labeling works. This research initially addresses the problem of few-shot semantic segmentation (FSS), which requires segmenting the novel class objects in a test image on the condition of a few labeled data. For the challenges of prototype bias and sub-optimal feature representation, this research proposes the Masked Cross-image Encoding technique. This method captures shared information and mutual dependencies between training data and testing data, enhancing the visual properties of novel classes for improved prototype-feature matching. Then, we re-evaluate the standard binary matching paradigm employed in FSS and identify its association with potential false-matching and under-matching issues, which can significantly degrade segmentation performance. To alleviate this issue, a Multi-Prototype Discrimination scheme is introduced to explicitly assign each pixel-wise query features to a specific class, reducing class matching ambiguity present in conventional FSS methods. Building upon the FSS task, we tackle a more practical and challenging task known as Incremental Few-Shot Semantic Segmentation (iFSS). It requires a deep learning model to continuously learn new classes with scarce annotated examples, while retaining the knowledge learned from previously encountered classes. We consider a meta-learning-based approach that simulates the incremental learning evaluation protocol during the base training stage. This training task alignment strategy encourages the model to learn how to incrementally adapt to novel classes without forgetting previous ones. The overall research contributes valuable insights and methodologies to enhance the effectiveness of few-shot learning approaches for semantic image segmentation.
Please use this identifier to cite or link to this item: