Time-sensitive Learning for Heterogeneous Federated Edge Intelligence

Xiao, Y; Zhang, X; Li, Y; Shi, G; Krunz, M; Nguyen, DN; Hoang, DT

Time-sensitive Learning for Heterogeneous Federated Edge Intelligence

Xiao, Y Zhang, X Li, Y Shi, G Krunz, M Nguyen, DN Hoang, DT

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Mobile Computing, 2023, PP, (99), pp. 1-18
Issue Date:: 2023-01-01

Embargoed

	Filename	Description	Size
	Time-sensitive Learning for Heterogeneous Federated Edge Intelligence.pdf	Accepted version	7.9 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 16 Jan 2025

Full metadata record

Field	Value	Language
dc.contributor.author	Xiao, Y
dc.contributor.author	Zhang, X
dc.contributor.author	Li, Y
dc.contributor.author	Shi, G
dc.contributor.author	Krunz, M
dc.contributor.author	Nguyen, DN
dc.contributor.author	Hoang, DT
dc.date.accessioned	2023-08-27T08:46:40Z
dc.date.available	2023-08-27T08:46:40Z
dc.date.issued	2023-01-01
dc.identifier.citation	IEEE Transactions on Mobile Computing, 2023, PP, (99), pp. 1-18
dc.identifier.issn	1536-1233
dc.identifier.issn	1558-0660
dc.identifier.uri	http://hdl.handle.net/10453/171831
dc.description.abstract	Real-time machine learning (ML) has recently attracted significant interest due to its potential to support instantaneous learning, adaptation, and decision making in a wide range of application domains, including self-driving vehicles, intelligent transportation, and industry automation. In this paper, we investigate real-time ML in a federated edge intelligence (FEI) system, an edge computing system that implements federated learning (FL) solutions based on data samples collected and uploaded from decentralized data networks, e.g., Internet-of-Things (IoT) and/or wireless sensor networks. FEI systems often exhibit heterogenous communication and computational resource distribution, as well as non-i.i.d. data samples arrived at different edge servers, resulting in long model training time and inefficient resource utilization. Motivated by this fact, we propose a time-sensitive federated learning (TS-FL) framework to minimize the overall run-time for collaboratively training a shared ML model with desirable accuracy. Training acceleration solutions for both TS-FL with synchronous coordination (TS-FL-SC) and asynchronous coordination (TS-FL-ASC) are investigated. To address the straggler effect in TS-FL-SC, we develop an analytical solution to characterize the impact of selecting different subsets of edge servers on the overall model training time. A server dropping-based solution is proposed to allow some slow-performance edge servers to be removed from participating in the model training if their impact on the resulting model accuracy is limited. A joint optimization algorithm is proposed to minimize the overall time consumption of model training by selecting participating edge servers, the local epoch number (the number of model training iterations per coordination), and the data batch size (the number of data samples for each model training iteration). Motivated by the fact that data samples at the slowest edge server may exhibit special characteristics that cannot be removed from model training, we develop an analytical expression to characterize the impact of both staleness effect of asynchronous coordination and straggler effect of FL on the time consumption of TS-FL-ASC. We propose a load forwarding-based solution that allows a slow edge server to offload part of its training samples to trusted edge servers with higher processing capability. We develop a hardware prototype to evaluate the model training time of a heterogeneous FEI system. Experimental results show that our proposed TS-FL-SC and TS-FL-ASC can provide up to 63% and 28% of reduction, in the overall model training time, respectively, compared with traditional FL solutions.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Mobile Computing
dc.relation.isbasedon	10.1109/TMC.2023.3237374
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	0805 Distributed Computing, 0906 Electrical and Electronic Engineering, 1005 Communications Technologies
dc.subject.classification	Networking & Telecommunications
dc.subject.classification	4006 Communications engineering
dc.subject.classification	4604 Cybersecurity and privacy
dc.subject.classification	4606 Distributed computing and systems software
dc.title	Time-sensitive Learning for Heterogeneous Federated Edge Intelligence
dc.type	Journal Article
utslib.citation.volume	PP
utslib.for	0805 Distributed Computing
utslib.for	0906 Electrical and Electronic Engineering
utslib.for	1005 Communications Technologies
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2025-01-16T00:00:00+1000Z
dc.date.updated	2023-08-27T08:46:25Z
pubs.issue	99
pubs.publication-status	Published
pubs.volume	PP
utslib.citation.issue	99

Abstract:

Real-time machine learning (ML) has recently attracted significant interest due to its potential to support instantaneous learning, adaptation, and decision making in a wide range of application domains, including self-driving vehicles, intelligent transportation, and industry automation. In this paper, we investigate real-time ML in a federated edge intelligence (FEI) system, an edge computing system that implements federated learning (FL) solutions based on data samples collected and uploaded from decentralized data networks, e.g., Internet-of-Things (IoT) and/or wireless sensor networks. FEI systems often exhibit heterogenous communication and computational resource distribution, as well as non-i.i.d. data samples arrived at different edge servers, resulting in long model training time and inefficient resource utilization. Motivated by this fact, we propose a time-sensitive federated learning (TS-FL) framework to minimize the overall run-time for collaboratively training a shared ML model with desirable accuracy. Training acceleration solutions for both TS-FL with synchronous coordination (TS-FL-SC) and asynchronous coordination (TS-FL-ASC) are investigated. To address the straggler effect in TS-FL-SC, we develop an analytical solution to characterize the impact of selecting different subsets of edge servers on the overall model training time. A server dropping-based solution is proposed to allow some slow-performance edge servers to be removed from participating in the model training if their impact on the resulting model accuracy is limited. A joint optimization algorithm is proposed to minimize the overall time consumption of model training by selecting participating edge servers, the local epoch number (the number of model training iterations per coordination), and the data batch size (the number of data samples for each model training iteration). Motivated by the fact that data samples at the slowest edge server may exhibit special characteristics that cannot be removed from model training, we develop an analytical expression to characterize the impact of both staleness effect of asynchronous coordination and straggler effect of FL on the time consumption of TS-FL-ASC. We propose a load forwarding-based solution that allows a slow edge server to offload part of its training samples to trusted edge servers with higher processing capability. We develop a hardware prototype to evaluate the model training time of a heterogeneous FEI system. Experimental results show that our proposed TS-FL-SC and TS-FL-ASC can provide up to 63% and 28% of reduction, in the overall model training time, respectively, compared with traditional FL solutions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171831