Extracting generic text information from images

Zeng, C

Extracting generic text information from images

Zeng, C

Permalink

Publication Type:: Thesis
Issue Date:: 2013

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (86.22 kB)

Adobe PDF

Download thesisAdobe PDF (5.98 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zeng, C
dc.date.accessioned	2014-04-04T00:43:16Z
dc.date.available	2014-04-04T00:43:16Z
dc.date.issued	2013
dc.identifier.uri	http://hdl.handle.net/10453/25999
dc.description	University of Technology, Sydney. Faculty of Engineering and Information Technology.	en_US
dc.description.abstract	As a vast amount of text appears everywhere, including natural scene, web pages and videos, text becomes very important information for different applications. Extracting text information from images and video frames is the first step of applying them to a specific application and this task is completed by a text information extraction (TIE) system. TIE consists of text detection, text binarisation and text recognition. For different applications or projects, one or more of these three TIE components may be embedded. Although many efforts have been made to extract text from images and videos, this problem is far from being solved due to the difficulties existing in different scenarios. This thesis focuses on the research of text detection and text binarisation. For the work on text detection in born-digital images, a new scheme for coarse text detection and a texture-based feature for fine text detection are proposed. In the coarse detection step, a novel scheme based on Maximum Gradient Difference (MGD) response of text lines is proposed. MGD values are classified into multiple clusters by a clustering algorithm to create multiple layer images. Then, the text line candidates are detected in different layer images. An SVM classifier trained by a novel texture-based feature is utilized to filter out the non-text regions. The superiority of the proposed feature is demonstrated by comparing with other features for text/non-text classification capability. Another algorithm is designed for detecting texts from natural scene images. Maximally Stable Extremal Regions (MSERs) as character candidates are classified into character MSERs and non-character MSERs based on geometry-based, stroke-based, HOG-based and colour-based features. Two types of misclassified character MSERs are retrieved by two different schemes respectively. A false alarm elimination step is performed for increasing the text detection precision and the bootstrap strategy is used to enhance the power of suppressing false positives. Both promising recall rate and precision rate are achieved. In the aspect of text binarisation research, the combination of the selected colour channel image and graph-based technique are explored firstly. The colour channel image with the histogram having the biggest distance, estimated by mean-shift procedure, between the two main peaks is selected before the graph model is constructed. Then, Normalised cut is employed on the graph to get the binarisation result. For circumventing the drawbacks of the grayscale-based method, a colour-based text binarisation method is proposed. A modified Connected Component (CC)-based validation measurement and a new objective segmentation evaluation criterion are applied as sequential processing. The experimental results show the effectiveness of our text binarisation algorithms.	en_US
dc.format	Thesis (PhD)	en_US
dc.language.iso	en	en_US
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/25999/2/02whole.pdf
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.subject	Text binarisation.	en
dc.subject	Pattern recognition systems.	en
dc.subject	Image segmentation.	en
dc.subject	Text extraction.	en
dc.title	Extracting generic text information from images	en_US
dc.type	Thesis
utslib.copyright.status	open_access

Abstract:

As a vast amount of text appears everywhere, including natural scene, web pages and videos, text becomes very important information for different applications. Extracting text information from images and video frames is the first step of applying them to a specific application and this task is completed by a text information extraction (TIE) system. TIE consists of text detection, text binarisation and text recognition. For different applications or projects, one or more of these three TIE components may be embedded. Although many efforts have been made to extract text from images and videos, this problem is far from being solved due to the difficulties existing in different scenarios. This thesis focuses on the research of text detection and text binarisation. For the work on text detection in born-digital images, a new scheme for coarse text detection and a texture-based feature for fine text detection are proposed. In the coarse detection step, a novel scheme based on Maximum Gradient Difference (MGD) response of text lines is proposed. MGD values are classified into multiple clusters by a clustering algorithm to create multiple layer images. Then, the text line candidates are detected in different layer images. An SVM classifier trained by a novel texture-based feature is utilized to filter out the non-text regions. The superiority of the proposed feature is demonstrated by comparing with other features for text/non-text classification capability. Another algorithm is designed for detecting texts from natural scene images. Maximally Stable Extremal Regions (MSERs) as character candidates are classified into character MSERs and non-character MSERs based on geometry-based, stroke-based, HOG-based and colour-based features. Two types of misclassified character MSERs are retrieved by two different schemes respectively. A false alarm elimination step is performed for increasing the text detection precision and the bootstrap strategy is used to enhance the power of suppressing false positives. Both promising recall rate and precision rate are achieved. In the aspect of text binarisation research, the combination of the selected colour channel image and graph-based technique are explored firstly. The colour channel image with the histogram having the biggest distance, estimated by mean-shift procedure, between the two main peaks is selected before the graph model is constructed. Then, Normalised cut is employed on the graph to get the binarisation result. For circumventing the drawbacks of the grayscale-based method, a colour-based text binarisation method is proposed. A modified Connected Component (CC)-based validation measurement and a new objective segmentation evaluation criterion are applied as sequential processing. The experimental results show the effectiveness of our text binarisation algorithms.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/25999