Neural Architecture Search With a Lightweight Transformer for Text-to-Image Synthesis

Li, W; Wen, S; Shi, K; Yang, Y; Huang, T

Neural Architecture Search With a Lightweight Transformer for Text-to-Image Synthesis

Li, W Wen, S

Shi, K Yang, Y Huang, T

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Network Science and Engineering, 2022, 9, (3), pp. 1567-1576
Issue Date:: 2022-01-01

Closed Access

	Filename	Description	Size
	Neural Architecture Search With a Lightweight Transformer for Text-to-Image Synthesis.pdf	Published version	1.59 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, W
dc.contributor.author	Wen, S https://orcid.org/0000-0001-8077-7001
dc.contributor.author	Shi, K
dc.contributor.author	Yang, Y
dc.contributor.author	Huang, T
dc.date.accessioned	2023-02-28T04:08:23Z
dc.date.available	2023-02-28T04:08:23Z
dc.date.issued	2022-01-01
dc.identifier.citation	IEEE Transactions on Network Science and Engineering, 2022, 9, (3), pp. 1567-1576
dc.identifier.issn	2334-329X
dc.identifier.issn	2327-4697
dc.identifier.uri	http://hdl.handle.net/10453/166549
dc.description.abstract	Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of the latest works in this field are based on the network architectures proposed by predecessors, such as StackGAN, AttnGAN, etc. Since the quality for text-to-image synthesis is more and more demanding, these old and tandem architectures with simple convolution operations are no longer suitable. Therefore, a novel text-to-image synthesis network combining with the latest technologies is in urgent need of exploration. To tackle with this challenge, we creatively propose a unique architecture for text-to-image synthesis, dubbed T2IGAN, which is automatically searched by neural architecture search (NAS). In addition, considering the amazing capabilities of the popular transformer in natural language processing and computer vision, a lightweight transformer is applied in our search space to efficiently integrate the text features and image features. Ultimately, the effectiveness of our searched T2IGAN is remarkable by experimentally evaluating it on the typical text-to-image synthesis datasets. Specifically, we achieve an excellent result of IS 5.12 and FID 10.48 on CUB-200 Birds, IS 4.89 and FID 13.55 on Oxford-102 Flowers, IS 31.93 and FID 26.45 on COCO. By contrast with the state-of-the-art works, ours gets better performance on CUB-200 Birds and Oxford-102 Flowers.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	National Natural Science Foundation of China61673187
dc.relation.ispartof	IEEE Transactions on Network Science and Engineering
dc.relation.isbasedon	10.1109/TNSE.2022.3147787
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Neural Architecture Search With a Lightweight Transformer for Text-to-Image Synthesis
dc.type	Journal Article
utslib.citation.volume	9
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2023-02-28T04:08:22Z
pubs.issue	3
pubs.publication-status	Published
pubs.volume	9
utslib.citation.issue	3

Abstract:

Despite the cross-modal text-to-imagesynthesis task has achieved great success, most of the latest works in this field are based on the network architectures proposed by predecessors, such as StackGAN, AttnGAN, etc. Since the quality for text-to-image synthesis is more and more demanding, these old and tandem architectures with simple convolution operations are no longer suitable. Therefore, a novel text-to-image synthesis network combining with the latest technologies is in urgent need of exploration. To tackle with this challenge, we creatively propose a unique architecture for text-to-image synthesis, dubbed T2IGAN, which is automatically searched by neural architecture search (NAS). In addition, considering the amazing capabilities of the popular transformer in natural language processing and computer vision, a lightweight transformer is applied in our search space to efficiently integrate the text features and image features. Ultimately, the effectiveness of our searched T2IGAN is remarkable by experimentally evaluating it on the typical text-to-image synthesis datasets. Specifically, we achieve an excellent result of IS 5.12 and FID 10.48 on CUB-200 Birds, IS 4.89 and FID 13.55 on Oxford-102 Flowers, IS 31.93 and FID 26.45 on COCO. By contrast with the state-of-the-art works, ours gets better performance on CUB-200 Birds and Oxford-102 Flowers.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/166549