Interpretable Code Summarization

Kamal, S; Nimmy, SF; Dey, N

Interpretable Code Summarization

Kamal, S Nimmy, SF Dey, N

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Reliability, 2024, PP, (99), pp. 1-10
Issue Date:: 2024

Closed Access

	Filename	Description	Size
	1779816.pdf	Published version	2.63 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Kamal, S
dc.contributor.author	Nimmy, SF
dc.contributor.author	Dey, N
dc.date.accessioned	2025-01-28T09:09:50Z
dc.date.available	2025-01-28T09:09:50Z
dc.date.issued	2024
dc.identifier.citation	IEEE Transactions on Reliability, 2024, PP, (99), pp. 1-10
dc.identifier.issn	0018-9529
dc.identifier.issn	1558-1721
dc.identifier.uri	http://hdl.handle.net/10453/184522
dc.description.abstract	Code summarization is a process of creating a readable natural language from programming source codes. Code summarization has become a popular research topic for software maintenance, code generation, and code recovery. Existing code summarization methods follow the encoding/decoding approach and use various machine learning techniques to generate natural language from source codes. Although most of these methods are state of the art, it is difficult to understand the complex encoding and decoding process to map the tokens with natural language words. Therefore, these coding and decoding approaches are treated as opaque models (black box). This research proposes explainable AI methods that overcome the black box features for the token mapping in code summarization process. Here, we created an abstract syntax tree (AST) from the tokens of the source code. We then embedded the AST into natural language words using a bilingual statistical probability approach to generate possible statistical parse trees. We applied a page rank algorithm among the parse trees to rank the trees. From the best-ranked tree, we generate the comment for the corresponding code snippet. To explain our code generation method, we used Takagi Sugeno fuzzy approach, layerwise relevance propagation and a hidden Markov model. These approaches make our method trustworthy and understandable to humans to understand the process of source code token mapping with natural language words.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Reliability
dc.relation.isbasedon	10.1109/tr.2024.3392876
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0803 Computer Software, 0906 Electrical and Electronic Engineering
dc.subject.classification	Operations Research
dc.subject.classification	4010 Engineering practice and education
dc.subject.classification	4612 Software engineering
dc.title	Interpretable Code Summarization
dc.type	Journal Article
utslib.citation.volume	PP
utslib.for	0803 Computer Software
utslib.for	0906 Electrical and Electronic Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	true
dc.date.updated	2025-01-28T09:09:48Z
pubs.issue	99
pubs.publication-status	Published
pubs.volume	PP
utslib.citation.issue	99

Abstract:

Code summarization is a process of creating a readable natural language from programming source codes. Code summarization has become a popular research topic for software maintenance, code generation, and code recovery. Existing code summarization methods follow the encoding/decoding approach and use various machine learning techniques to generate natural language from source codes. Although most of these methods are state of the art, it is difficult to understand the complex encoding and decoding process to map the tokens with natural language words. Therefore, these coding and decoding approaches are treated as opaque models (black box). This research proposes explainable AI methods that overcome the black box features for the token mapping in code summarization process. Here, we created an abstract syntax tree (AST) from the tokens of the source code. We then embedded the AST into natural language words using a bilingual statistical probability approach to generate possible statistical parse trees. We applied a page rank algorithm among the parse trees to rank the trees. From the best-ranked tree, we generate the comment for the corresponding code snippet. To explain our code generation method, we used Takagi Sugeno fuzzy approach, layerwise relevance propagation and a hidden Markov model. These approaches make our method trustworthy and understandable to humans to understand the process of source code token mapping with natural language words.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/184522