Self-Adaptive Gradient Quantization for Geo-Distributed Machine Learning over Heterogeneous and Dynamic Networks

Fan, C; Zhang, X; Zhao, Y; Liu, Y; Yu, S

Self-Adaptive Gradient Quantization for Geo-Distributed Machine Learning over Heterogeneous and Dynamic Networks

Fan, C Zhang, X Zhao, Y Liu, Y Yu, S

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Transactions on Cloud Computing, 2023, 11, (4), pp. 3483-3496
Issue Date:: 2023-10-01

Closed Access

	Filename	Description	Size
	Self-Adaptive_Gradient_Quantization_for_Geo-Distributed_Machine_Learning_Over_Heterogeneous_and_Dynamic_Networks.pdf	Published version	2.06 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Fan, C
dc.contributor.author	Zhang, X
dc.contributor.author	Zhao, Y
dc.contributor.author	Liu, Y
dc.contributor.author	Yu, S https://orcid.org/0000-0003-4485-6743
dc.date.accessioned	2024-03-12T04:27:47Z
dc.date.available	2024-03-12T04:27:47Z
dc.date.issued	2023-10-01
dc.identifier.citation	IEEE Transactions on Cloud Computing, 2023, 11, (4), pp. 3483-3496
dc.identifier.issn	2168-7161
dc.identifier.issn	2168-7161
dc.identifier.uri	http://hdl.handle.net/10453/176568
dc.description.abstract	Geo-Distributed Machine Learning (Geo-DML) has been proposed to collaborate geographically dispersed data centers (DCs) and train large scale machine learning (ML) models for various applications. While Geo-DML can achieve excellent performance, it also injects massive data traffic into the Wide Area Networks (WANs) in order to exchange gradients during model training process. Such a huge amount of traffic will not only incur network congestion and prolong the training procedure, but also result in straggler problem when DCs are working in heterogeneous network environments. To alleviate these problems, we propose Self-Adaptive Gradient Quantization (SAGQ) for Geo-DML in this work. In SAGQ, each worker DC adopts specific quantization method based on the heterogeneous and dynamic link bandwidth in order to reduce the communication overhead and balance the communication time among worker DCs. By doing so, SAGQ will speed up the Geo-DML training process without sacrificing the ML model performance. Extensive experiments show that compared with the state-of-The-Art techniques, SAGQ reduces the Wall-clock time spent to train an ML model by 1.13×-21.31×. In addition, SAGQ can also improve the model accuracy by 0.11%-2.27% over baselines.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	IEEE Transactions on Cloud Computing
dc.relation.isbasedon	10.1109/TCC.2023.3292525
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0805 Distributed Computing, 0806 Information Systems
dc.subject.classification	4606 Distributed computing and systems software
dc.title	Self-Adaptive Gradient Quantization for Geo-Distributed Machine Learning over Heterogeneous and Dynamic Networks
dc.type	Journal Article
utslib.citation.volume	11
utslib.for	0805 Distributed Computing
utslib.for	0806 Information Systems
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	University of Technology Sydney/Strength - CCSP - Centre for Cyber Security and Privacy
utslib.copyright.status	closed_access	*
dc.date.updated	2024-03-12T04:27:46Z
pubs.issue	4
pubs.publication-status	Published
pubs.volume	11
utslib.citation.issue	4

Abstract:

Geo-Distributed Machine Learning (Geo-DML) has been proposed to collaborate geographically dispersed data centers (DCs) and train large scale machine learning (ML) models for various applications. While Geo-DML can achieve excellent performance, it also injects massive data traffic into the Wide Area Networks (WANs) in order to exchange gradients during model training process. Such a huge amount of traffic will not only incur network congestion and prolong the training procedure, but also result in straggler problem when DCs are working in heterogeneous network environments. To alleviate these problems, we propose Self-Adaptive Gradient Quantization (SAGQ) for Geo-DML in this work. In SAGQ, each worker DC adopts specific quantization method based on the heterogeneous and dynamic link bandwidth in order to reduce the communication overhead and balance the communication time among worker DCs. By doing so, SAGQ will speed up the Geo-DML training process without sacrificing the ML model performance. Extensive experiments show that compared with the state-of-The-Art techniques, SAGQ reduces the Wall-clock time spent to train an ML model by 1.13×-21.31×. In addition, SAGQ can also improve the model accuracy by 0.11%-2.27% over baselines.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176568