Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

Bi, Z; Wan, Y; Wang, Z; Zhang, H; Guan, B; Lu, F; Zhang, Z; Sui, Y; Jin, H; Shi, X

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

Bi, Z Wan, Y Wang, Z Zhang, H Guan, B Lu, F Zhang, Z Sui, Y

Jin, H Shi, X

Permalink

Publisher:: Association for Computational Linguistics (ACL)
Publication Type:: Conference Proceeding
Citation:: Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, pp. 2336-2353
Issue Date:: 2024-01-01

Recently Added

	Filename	Description	Size
	2024.findings-acl.138.pdf	Published version	3.75 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is new to OPUS and is not currently available.

Full metadata record

Field	Value	Language
dc.contributor.author	Bi, Z
dc.contributor.author	Wan, Y
dc.contributor.author	Wang, Z
dc.contributor.author	Zhang, H
dc.contributor.author	Guan, B
dc.contributor.author	Lu, F
dc.contributor.author	Zhang, Z
dc.contributor.author	Sui, Y https://orcid.org/0000-0002-9510-6574
dc.contributor.author	Jin, H
dc.contributor.author	Shi, X
dc.date	2024-08
dc.date.accessioned	2025-05-06T03:27:35Z
dc.date.available	2025-05-06T03:27:35Z
dc.date.issued	2024-01-01
dc.identifier.citation	Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, pp. 2336-2353
dc.identifier.issn	0736-587X
dc.identifier.uri	http://hdl.handle.net/10453/187198
dc.description.abstract	Large Language Models (LLMs) have shown remarkable progress in automated code generation. Yet, LLM-generated code may contain errors in API usage, class, data structure, or missing project-specific information. As much of this project-specific context cannot fit into the prompts of LLMs, we must find ways to allow the model to explore the project-level code context. We present COCOGEN, a new code generation approach that uses compiler feedback to improve the LLM-generated code. COCOGEN first leverages static analysis to identify mismatches between the generated code and the project's context. It then iteratively aligns and fixes the identified errors using information extracted from the code repository. We integrate COCOGEN with two representative LLMs, i.e., GPT-3.5-Turbo and Code Llama (13B), and apply it to Python code generation. Experimental results show that COCOGEN significantly improves the vanilla LLMs by over 80% in generating code dependent on the project context and consistently outperforms the existing retrieval-based code generation baselines.
dc.language	en
dc.publisher	Association for Computational Linguistics (ACL)
dc.relation.ispartof	Proceedings of the Annual Meeting of the Association for Computational Linguistics
dc.relation.ispartof	Findings of the Association for Computational Linguistics ACL 2024
dc.relation.isbasedon	10.18653/v1/2024.findings-acl.138
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.title	Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback
dc.type	Conference Proceeding
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/UTS Groups
pubs.organisational-group	University of Technology Sydney/UTS Groups/Australian Artificial Intelligence Institute (AAII)
pubs.organisational-group	University of Technology Sydney/UTS Groups/Centre for Cyber Security and Privacy (CCSP)
utslib.copyright.status	recently_added	*
dc.date.updated	2025-05-06T03:27:32Z
pubs.finish-date	2024-08
pubs.publication-status	Published
pubs.start-date	2024-08

Abstract:

Large Language Models (LLMs) have shown remarkable progress in automated code generation. Yet, LLM-generated code may contain errors in API usage, class, data structure, or missing project-specific information. As much of this project-specific context cannot fit into the prompts of LLMs, we must find ways to allow the model to explore the project-level code context. We present COCOGEN, a new code generation approach that uses compiler feedback to improve the LLM-generated code. COCOGEN first leverages static analysis to identify mismatches between the generated code and the project's context. It then iteratively aligns and fixes the identified errors using information extracted from the code repository. We integrate COCOGEN with two representative LLMs, i.e., GPT-3.5-Turbo and Code Llama (13B), and apply it to Python code generation. Experimental results show that COCOGEN significantly improves the vanilla LLMs by over 80% in generating code dependent on the project context and consistently outperforms the existing retrieval-based code generation baselines.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/187198