Coevolutionary Deep Reinforcement Learning

Cotton, D; Traish, J; Chaczko, Z

Coevolutionary Deep Reinforcement Learning

Cotton, D Traish, J Chaczko, Z

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, 2021, 00, pp. 2600-2607
Issue Date:: 2021-01-05

Closed Access

	Filename	Description	Size
	Coevolutionary_Deep_Reinforcement_Learning.pdf	Published version	218.91 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Cotton, D
dc.contributor.author	Traish, J
dc.contributor.author	Chaczko, Z https://orcid.org/0000-0002-2816-7510
dc.date	2020-12-01
dc.date.accessioned	2022-06-06T04:46:58Z
dc.date.available	2022-06-06T04:46:58Z
dc.date.issued	2021-01-05
dc.identifier.citation	2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, 2021, 00, pp. 2600-2607
dc.identifier.isbn	9781728125473
dc.identifier.uri	http://hdl.handle.net/10453/157968
dc.description.abstract	The ability to learn without instruction is a powerful enabler for learning systems. A mechanism for this, selfplay, allows reinforcement learning to develop high performing policies without large datasets or expert knowledge. Despite these benefits, self-play is known to be less sample efficient and suffer unstable learning dynamics. This is in part due to a nonstationary learning problem where an agent's actions influence their opponents and as a consequence the training data they receive. In this paper we demonstrate that competitive pressures can be utilised to improve self-play. This paper leverages coevolution, an evolutionary inspired process in which individuals are compelled to innovate and adapt, to optimise the training of a population of reinforcement learning agents. We demonstrate that our algorithm improves the final performance of a Rainbow DQN trained in the game Connect Four, achieving a 15% higher win percentage over the next leading self-play algorithm. Furthermore, our algorithm exhibits more stable training with less variation in evaluation performance.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020
dc.relation.ispartof	2020 IEEE Symposium Series on Computational Intelligence (SSCI)
dc.relation.isbasedon	10.1109/SSCI47803.2020.9308290
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Coevolutionary Deep Reinforcement Learning
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Australia
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2022-06-06T04:46:57Z
pubs.finish-date	2020-12-04
pubs.publication-status	Published
pubs.start-date	2020-12-01
pubs.volume	00

Abstract:

The ability to learn without instruction is a powerful enabler for learning systems. A mechanism for this, selfplay, allows reinforcement learning to develop high performing policies without large datasets or expert knowledge. Despite these benefits, self-play is known to be less sample efficient and suffer unstable learning dynamics. This is in part due to a nonstationary learning problem where an agent's actions influence their opponents and as a consequence the training data they receive. In this paper we demonstrate that competitive pressures can be utilised to improve self-play. This paper leverages coevolution, an evolutionary inspired process in which individuals are compelled to innovate and adapt, to optimise the training of a population of reinforcement learning agents. We demonstrate that our algorithm improves the final performance of a Rainbow DQN trained in the game Connect Four, achieving a 15% higher win percentage over the next leading self-play algorithm. Furthermore, our algorithm exhibits more stable training with less variation in evaluation performance.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/157968