Federated Multi-Agent Deep Reinforcement Learning for Resource Allocation of Vehicle-to-Vehicle Communications
- Publisher:
- Institute of Electrical and Electronics Engineers (IEEE)
- Publication Type:
- Journal Article
- Citation:
- IEEE Transactions on Vehicular Technology, 2022, 71, (8), pp. 8810-8824
- Issue Date:
- 2022-08-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
Federated Multi-Agent Deep Reinforcement Learning for Resource Allocation of Vehicle-to-Vehicle Communications.pdf | Published version | 2.61 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
Dynamic topology, fast-changing channels and the time sensitivity of safety-related services present challenges to the status quo of resource allocation for cellular-underlaying vehicle-to-vehicle (V2V) communications. In this paper, we investigate a novel federated multi-agent deep reinforcement learning (FedMARL) approach for the decentralized joint optimization of channel selection and power control for V2V communication. The approach takes advantage of both deep reinforcement learning (DRL) and federated learning (FL), satisfying the reliability and delay requirements of V2V communication while maximizing the transmit rates of cellular links. Specifically, we elaborately construct individual V2V agent implement by the dueling double deep Q-network (D3QN), and design the reward function to train V2V agents collaboratively. As a result, each agent individually optimizes channel selection and power level based on its local observations, including the instantaneous channel state information (CSI) of corresponding V2V link, the instantaneous co-channel interference from the cellular link, the previous channels selections of nearby V2V pairs, and the queue backlog at the V2V transmitter. Another important aspect is that we incorporate FL to alleviate the training instability problem induced by cooperative multi-agent environment. The local DRL models of different V2V agents are federated periodically, addressing the limitations of partial observability on the entire network status for individual agent, and accelerating the training process of multi-agent learning. Validated via simulations, the proposed FedMARL scheme shows superiority to the baselines in terms of the cellular sum-rate and the V2V packet delivery rate.
Please use this identifier to cite or link to this item: