Multi-stage cascaded deconvolution for depth map and surface normal prediction from single image

Padhy, RP; Chang, X; Choudhury, SK; Sa, PK; Bakshi, S

Multi-stage cascaded deconvolution for depth map and surface normal prediction from single image

Padhy, RP Chang, X

Choudhury, SK Sa, PK Bakshi, S

Permalink

Publisher:: ELSEVIER
Publication Type:: Journal Article
Citation:: Pattern Recognition Letters, 2019, 127, pp. 165-173
Issue Date:: 2019-11-01

Closed Access

	Filename	Description	Size
	1-s2.0-S0167865518303015-main.pdf	Published version	2.67 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Padhy, RP
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807
dc.contributor.author	Choudhury, SK
dc.contributor.author	Sa, PK
dc.contributor.author	Bakshi, S
dc.date.accessioned	2022-09-06T02:48:53Z
dc.date.available	2022-09-06T02:48:53Z
dc.date.issued	2019-11-01
dc.identifier.citation	Pattern Recognition Letters, 2019, 127, pp. 165-173
dc.identifier.issn	0167-8655
dc.identifier.issn	1872-7344
dc.identifier.uri	http://hdl.handle.net/10453/161409
dc.description.abstract	Understanding the 3D perspective of a scene is imperative in improving the precision of intelligent autonomous systems. The difficulty in understanding is compounded when only one image of the scene is available at disposal. In this regard, we propose a fully convolutional deep framework for predicting the depth map and surface normal from a single RGB image in a common architecture. The DenseNet CNN architecture is employed to learn the complex mapping between an input RGB image and its corresponding 3D primitives. We introduce a novel approach of multi-stage cascaded deconvolution, where the output feature maps of one dense block are reused by concatenating with the feature maps of the corresponding deconvolution block. These combined feature maps are progressed along the deep network in a pre-activated manner to construct the final output. The network is trained separately for estimating depth and surface normal while keeping the architecture same. The suggested architecture, compared to the counterparts, uses fewer training samples and model parameters. Exhaustive experiments on benchmark dataset not only reveal the efficacy of the proposed multi-stage scheme over the one-way sequential deconvolution but also outperform the state-of-the-art methods.
dc.language	English
dc.publisher	ELSEVIER
dc.relation.ispartof	Pattern Recognition Letters
dc.relation.isbasedon	10.1016/j.patrec.2018.07.012
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering, 1702 Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Multi-stage cascaded deconvolution for depth map and surface normal prediction from single image
dc.type	Journal Article
utslib.citation.volume	127
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0906 Electrical and Electronic Engineering
utslib.for	1702 Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
dc.date.updated	2022-09-06T02:48:29Z
pubs.publication-status	Published
pubs.volume	127

Abstract:

Understanding the 3D perspective of a scene is imperative in improving the precision of intelligent autonomous systems. The difficulty in understanding is compounded when only one image of the scene is available at disposal. In this regard, we propose a fully convolutional deep framework for predicting the depth map and surface normal from a single RGB image in a common architecture. The DenseNet CNN architecture is employed to learn the complex mapping between an input RGB image and its corresponding 3D primitives. We introduce a novel approach of multi-stage cascaded deconvolution, where the output feature maps of one dense block are reused by concatenating with the feature maps of the corresponding deconvolution block. These combined feature maps are progressed along the deep network in a pre-activated manner to construct the final output. The network is trained separately for estimating depth and surface normal while keeping the architecture same. The suggested architecture, compared to the counterparts, uses fewer training samples and model parameters. Exhaustive experiments on benchmark dataset not only reveal the efficacy of the proposed multi-stage scheme over the one-way sequential deconvolution but also outperform the state-of-the-art methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/161409