Leaving the NavMesh: An Ablative Analysis of Deep Reinforcement Learning for Complex Navigation in 3D Virtual Environments

Grant, D; Garcia, J; Raffe, W

Leaving the NavMesh: An Ablative Analysis of Deep Reinforcement Learning for Complex Navigation in 3D Virtual Environments

Grant, D

Garcia, J Raffe, W

Permalink

Publisher:: Springer Nature
Publication Type:: Chapter
Citation:: AI 2023: Advances in Artificial Intelligence, 2024, 14472 LNAI, pp. 286-297
Issue Date:: 2024-01-01

Embargoed

	Filename	Description	Size
	Leaving the NavMesh.pdf	Submitted version	1.44 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 27 Nov 2025

Full metadata record

Field	Value	Language
dc.contributor.author	Grant, D https://orcid.org/0000-0001-8739-6121
dc.contributor.author	Garcia, J
dc.contributor.author	Raffe, W
dc.date.accessioned	2023-12-20T18:07:23Z
dc.date.available	2023-12-20T18:07:23Z
dc.date.issued	2024-01-01
dc.identifier.citation	AI 2023: Advances in Artificial Intelligence, 2024, 14472 LNAI, pp. 286-297
dc.identifier.isbn	9789819983902
dc.identifier.uri	http://hdl.handle.net/10453/173886
dc.description.abstract	Expanding non-player character (NPC) navigation behavior in video games has the potential to induce novel player experiences. Current industry standards represent traversable world geometry by utilizing a Navigation Mesh (NavMesh); however NavMesh complexity scales poorly with additional navigation abilities (e.g. jumping, wall-running, jet-packs, etc.) and increasing world scale. Deep Reinforcement Learning (DRL) allows for an NPC agent to learn how to navigate environmental obstacles with any navigation ability without NavMesh dependence. Despite the promise of DRL navigation, adoption in industry remains low due to the required expert knowledge in agent design and the poor training efficiency of DRL algorithms. In this work, we utilize the off-policy Soft-Actor Critic (SAC) DRL algorithm to investigate the importance of different local observation types and agent scalar information to agent performance across three topologically distinct environments. We implement a truncated n-step returns method for minibatch sampling which improves early training efficiency by up to 75% by reducing inaccurate off-policy bias. We empirically evaluate environment partial observability with observation stacking where we find that 4–8 observation stacks renders the environments sufficiently Markovian.
dc.language	en
dc.publisher	Springer Nature
dc.relation.ispartof	AI 2023: Advances in Artificial Intelligence
dc.relation.ispartofseries	Lecture Notes in Computer Science
dc.relation.isbasedon	10.1007/978-981-99-8391-9_23
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	46 Information and computing sciences
dc.title	Leaving the NavMesh: An Ablative Analysis of Deep Reinforcement Learning for Complex Navigation in 3D Virtual Environments
dc.type	Chapter
utslib.citation.volume	14472 LNAI
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2025-11-27T00:00:00+1000Z
dc.date.updated	2023-12-20T18:07:22Z
pubs.publication-status	Accepted
pubs.volume	14472 LNAI

Abstract:

Expanding non-player character (NPC) navigation behavior in video games has the potential to induce novel player experiences. Current industry standards represent traversable world geometry by utilizing a Navigation Mesh (NavMesh); however NavMesh complexity scales poorly with additional navigation abilities (e.g. jumping, wall-running, jet-packs, etc.) and increasing world scale. Deep Reinforcement Learning (DRL) allows for an NPC agent to learn how to navigate environmental obstacles with any navigation ability without NavMesh dependence. Despite the promise of DRL navigation, adoption in industry remains low due to the required expert knowledge in agent design and the poor training efficiency of DRL algorithms. In this work, we utilize the off-policy Soft-Actor Critic (SAC) DRL algorithm to investigate the importance of different local observation types and agent scalar information to agent performance across three topologically distinct environments. We implement a truncated n-step returns method for minibatch sampling which improves early training efficiency by up to 75% by reducing inaccurate off-policy bias. We empirically evaluate environment partial observability with observation stacking where we find that 4–8 observation stacks renders the environments sufficiently Markovian.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/173886