Searching for a Robust Neural Architecture in Four GPU Hours

Dong, X; Yang, Y

Searching for a Robust Neural Architecture in Four GPU Hours

Dong, X

Yang, Y

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2019-June, pp. 1761-1770
Issue Date:: 2020-01-09

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 9 Jan 2022

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (498.75 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Dong, X https://orcid.org/0000-0001-9272-1590
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date	2019-06-15
dc.date.accessioned	2021-04-15T10:03:38Z
dc.date.available	2021-04-15T10:03:38Z
dc.date.issued	2020-01-09
dc.identifier.citation	2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2019-June, pp. 1761-1770
dc.identifier.isbn	978-1-7281-3294-5
dc.identifier.issn	1063-6919
dc.identifier.uri	http://hdl.handle.net/10453/148140
dc.description.abstract	Conventional neural architecture search (NAS) approaches are usually based on reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach, which learns the searching approach by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains thousands of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on CIFAR-10, and the discovered model obtains a test error of 2.82% with only 2.5M parameters, which is on par with the state-of-the-art.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.relation.ispartof	2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
dc.relation.isbasedon	10.1109/cvpr.2019.00186
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	Searching for a Robust Neural Architecture in Four GPU Hours
dc.type	Conference Proceeding
utslib.citation.volume	2019-June
utslib.location.activity	Long Beach, CA, USA
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2022-01-09T00:00:00+1000Z
dc.date.updated	2021-04-15T10:03:37Z
pubs.finish-date	2019-06-20
pubs.publication-status	Published
pubs.start-date	2019-06-15
pubs.volume	2019-June

Abstract:

Conventional neural architecture search (NAS) approaches are usually based on reinforcement learning or evolutionary strategy, which take more than 1000 GPU hours to find a good model on CIFAR-10. We propose an efficient NAS approach, which learns the searching approach by gradient descent. Our approach represents the search space as a directed acyclic graph (DAG). This DAG contains thousands of sub-graphs, each of which indicates a kind of neural architecture. To avoid traversing all the possibilities of the sub-graphs, we develop a differentiable sampler over the DAG. This sampler is learnable and optimized by the validation loss after training the sampled architecture. In this way, our approach can be trained in an end-to-end fashion by gradient descent, named Gradient-based search using Differentiable Architecture Sampler (GDAS). In experiments, we can finish one searching procedure in four GPU hours on CIFAR-10, and the discovered model obtains a test error of 2.82% with only 2.5M parameters, which is on par with the state-of-the-art.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/148140