Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

Xia, J; Qu, W; Huang, W; Zhang, J; Wang, X; Xu, M

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

Xia, J Qu, W Huang, W Zhang, J Wang, X Xu, M

Permalink

Publisher:: IEEE COMPUTER SOC
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June, pp. 4042-4051
Issue Date:: 2022-01-01

Closed Access

	Filename	Description	Size
	Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning.pdf	Published version	2.44 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xia, J
dc.contributor.author	Qu, W
dc.contributor.author	Huang, W
dc.contributor.author	Zhang, J
dc.contributor.author	Wang, X
dc.contributor.author	Xu, M https://orcid.org/0000-0001-9581-8849
dc.date	2022-06-18
dc.date.accessioned	2023-04-11T05:43:41Z
dc.date.available	2023-04-11T05:43:41Z
dc.date.issued	2022-01-01
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June, pp. 4042-4051
dc.identifier.isbn	9781665469463
dc.identifier.issn	1063-6919
dc.identifier.uri	http://hdl.handle.net/10453/169611
dc.description.abstract	Heatmap regression methods have dominated face alignment area in recent years while they ignore the inherent relation between different landmarks. In this paper, we propose a Sparse Local Patch Transformer (SLPT) for learning the inherent relation. The SLPT generates the representation of each single landmark from a local patch and aggregates them by an adaptive inherent relation based on the attention mechanism. The subpixel coordinate of each landmark is predicted independently based on the aggregated feature. Moreover, a coarse-to-fine framework is further introduced to incorporate with the SLPT, which enables the initial landmarks to gradually converge to the target facial landmarks using fine-grained features from dynamically resized local patches. Extensive experiments carried out on three popular benchmarks, including WFLW, 300W and COFW, demonstrate that the proposed method works at the state-of-the-art level with much less computational complexity by learning the inherent relation between facial landmarks. The code is available at the project website11https://github.com/Jiahao-UTS/SLPT-master.
dc.language	en
dc.publisher	IEEE COMPUTER SOC
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
dc.relation.ispartof	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition
dc.relation.isbasedon	10.1109/CVPR52688.2022.00402
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning
dc.type	Conference Proceeding
utslib.citation.volume	2022-June
utslib.location.activity	New Orleans, LA
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2023-04-11T05:43:39Z
pubs.finish-date	2022-06-24
pubs.publication-status	Published
pubs.start-date	2022-06-18
pubs.volume	2022-June

Abstract:

Heatmap regression methods have dominated face alignment area in recent years while they ignore the inherent relation between different landmarks. In this paper, we propose a Sparse Local Patch Transformer (SLPT) for learning the inherent relation. The SLPT generates the representation of each single landmark from a local patch and aggregates them by an adaptive inherent relation based on the attention mechanism. The subpixel coordinate of each landmark is predicted independently based on the aggregated feature. Moreover, a coarse-to-fine framework is further introduced to incorporate with the SLPT, which enables the initial landmarks to gradually converge to the target facial landmarks using fine-grained features from dynamically resized local patches. Extensive experiments carried out on three popular benchmarks, including WFLW, 300W and COFW, demonstrate that the proposed method works at the state-of-the-art level with much less computational complexity by learning the inherent relation between facial landmarks. The code is available at the project website11https://github.com/Jiahao-UTS/SLPT-master.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/169611