A new unified method for detecting text from marathon runners and sports players in video (PR-D-19-01078R2)

Nag, S; Shivakumara, P; Pal, U; Lu, T; Blumenstein, M

A new unified method for detecting text from marathon runners and sports players in video (PR-D-19-01078R2)

Nag, S Shivakumara, P Pal, U Lu, T Blumenstein, M

Permalink

Publisher:: ELSEVIER SCI LTD
Publication Type:: Journal Article
Citation:: Pattern Recognition, 2020, 107
Issue Date:: 2020-11-01

Closed Access

	Filename	Description	Size
	1-s2.0-S003132032030279X-main.pdf	Published version	7.77 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Nag, S
dc.contributor.author	Shivakumara, P
dc.contributor.author	Pal, U
dc.contributor.author	Lu, T
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744
dc.date.accessioned	2020-10-18T19:25:09Z
dc.date.available	2020-10-18T19:25:09Z
dc.date.issued	2020-11-01
dc.identifier.citation	Pattern Recognition, 2020, 107
dc.identifier.issn	0031-3203
dc.identifier.issn	1873-5142
dc.identifier.uri	http://hdl.handle.net/10453/143342
dc.description.abstract	© 2020 Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions. Candidate regions are used for determining the number of temporal frame clusters obtained by K-means clustering on frame differences. This process in turn detects key frames. The proposed method explores Bayesian probability for skin portions using color values at both pixel and component levels of temporal frames, which provides fused images with skin components. Based on skin information, the proposed method then detects faces and torsos by finding structural and spatial coherences between them. We further propose adaptive pixels linking a deep learning model for text detection from torso regions. The proposed method is tested on our own dataset collected from marathon/sports video and three standard datasets, namely, RBNR, MMM and R-ID of marathon images, to evaluate the performance. In addition, the proposed method is also tested on the standard natural scene datasets, namely, CTW1500 and MS-COCO text datasets, to show the objectiveness of the proposed method. A comparative study with the state-of-the-art methods on bib number/text detection of different datasets shows that the proposed method outperforms the existing methods.
dc.language	English
dc.publisher	ELSEVIER SCI LTD
dc.relation.ispartof	Pattern Recognition
dc.relation.isbasedon	10.1016/j.patcog.2020.107476
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	A new unified method for detecting text from marathon runners and sports players in video (PR-D-19-01078R2)
dc.type	Journal Article
utslib.citation.volume	107
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0806 Information Systems
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2020-10-18T19:25:02Z
pubs.publication-status	Accepted
pubs.volume	107

Abstract:

© 2020 Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions. Candidate regions are used for determining the number of temporal frame clusters obtained by K-means clustering on frame differences. This process in turn detects key frames. The proposed method explores Bayesian probability for skin portions using color values at both pixel and component levels of temporal frames, which provides fused images with skin components. Based on skin information, the proposed method then detects faces and torsos by finding structural and spatial coherences between them. We further propose adaptive pixels linking a deep learning model for text detection from torso regions. The proposed method is tested on our own dataset collected from marathon/sports video and three standard datasets, namely, RBNR, MMM and R-ID of marathon images, to evaluate the performance. In addition, the proposed method is also tested on the standard natural scene datasets, namely, CTW1500 and MS-COCO text datasets, to show the objectiveness of the proposed method. A comparative study with the state-of-the-art methods on bib number/text detection of different datasets shows that the proposed method outperforms the existing methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/143342