Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

Wan, Y; Bi, Z; He, Y; Zhang, J; Zhang, H; Sui, Y; Xu, G; Jin, H; Yu, P

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

Wan, Y Bi, Z He, Y Zhang, J Zhang, H Sui, Y

Xu, G

Jin, H Yu, P

Permalink

Publisher:: Association for Computing Machinery (ACM)
Publication Type:: Journal Article
Citation:: ACM Computing Surveys

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted versionAdobe PDF (2.46 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Wan, Y
dc.contributor.author	Bi, Z
dc.contributor.author	He, Y
dc.contributor.author	Zhang, J
dc.contributor.author	Zhang, H
dc.contributor.author	Sui, Y https://orcid.org/0000-0002-9510-6574
dc.contributor.author	Xu, G https://orcid.org/0000-0003-4493-6663
dc.contributor.author	Jin, H
dc.contributor.author	Yu, P
dc.date.accessioned	2024-06-05T02:05:18Z
dc.date.available	2024-06-05T02:05:18Z
dc.identifier.citation	ACM Computing Surveys
dc.identifier.issn	0360-0300
dc.identifier.issn	1557-7341
dc.identifier.uri	http://hdl.handle.net/10453/179406
dc.description.abstract	<jats:p>Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models (https://xcodemind.github.io). At last, we also point out several challenging and promising directions for future research.</jats:p>
dc.language	en
dc.publisher	Association for Computing Machinery (ACM)
dc.relation.ispartof	ACM Computing Surveys
dc.relation.isbasedon	10.1145/3664597
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	“©ACM2024. This is the author’s version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in 18 May 2024 https://doi.org/10.1145/3664597”
dc.subject	08 Information and Computing Sciences
dc.subject.classification	Information Systems
dc.subject.classification	46 Information and computing sciences
dc.title	Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
dc.type	Journal Article
utslib.for	08 Information and Computing Sciences
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2024-06-05T02:05:16Z
pubs.publication-status	Published online

Abstract:

Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models (https://xcodemind.github.io). At last, we also point out several challenging and promising directions for future research.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/179406