Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models

Zhang, D; Listiyani, D; Singh, P; Mohanty, M

Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models

Zhang, D Listiyani, D Singh, P Mohanty, M

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Access, 2025, 13, pp. 56296-56325
Issue Date:: 2025-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (4.82 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, D
dc.contributor.author	Listiyani, D
dc.contributor.author	Singh, P
dc.contributor.author	Mohanty, M https://orcid.org/0000-0002-0258-4586
dc.date.accessioned	2025-07-31T05:09:54Z
dc.date.available	2025-07-31T05:09:54Z
dc.date.issued	2025-01-01
dc.identifier.citation	IEEE Access, 2025, 13, pp. 56296-56325
dc.identifier.issn	2169-3536
dc.identifier.issn	2169-3536
dc.identifier.uri	http://hdl.handle.net/10453/188896
dc.description.abstract	In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowledge distillation. It explores KD’s impact on LLMs, emphasizing its utility in model compression, performance enhancement, and self-improvement. Through the analysis of practical examples such as DistilBERT, TinyBERT, and MobileBERT, the paper demonstrates how knowledge distillation can markedly enhance the efficiency and applicability of large language models in real-world scenarios. The discussion encompasses the varied applications of KD across multiple domains, including industrial systems, embedded systems, Natural Language Processing (NLP), multi-modal processing, and vertical domains, such as medicine, law, science, finance, and materials science. This survey outlines current KD methodologies and future research directions, highlighting its role in advancing AI technologies and fostering innovation across different sectors.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation.ispartof	IEEE Access
dc.relation.isbasedon	10.1109/ACCESS.2025.3554586
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 10 Technology
dc.subject.classification	40 Engineering
dc.subject.classification	46 Information and computing sciences
dc.title	Distilling Wisdom: A Review on Optimizing Learning From Massive Language Models
dc.type	Journal Article
utslib.citation.volume	13
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	10 Technology
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Science
pubs.organisational-group	University of Technology Sydney/Faculty of Science/School of Mathematical and Physical Sciences
pubs.organisational-group	University of Technology Sydney/UTS Groups
pubs.organisational-group	University of Technology Sydney/UTS Groups/Centre for Forensic Science (CFS)
utslib.copyright.status	open_access	*
dc.rights.license	This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
dc.date.updated	2025-07-31T05:09:50Z
pubs.publication-status	Published
pubs.volume	13

Abstract:

In the era of Large Language Models (LLMs), Knowledge Distillation (KD) enables the transfer of capabilities from proprietary LLMs to open-source models. This survey provides a detailed discussion of the basic principles, algorithms, and implementation methods of knowledge distillation. It explores KD’s impact on LLMs, emphasizing its utility in model compression, performance enhancement, and self-improvement. Through the analysis of practical examples such as DistilBERT, TinyBERT, and MobileBERT, the paper demonstrates how knowledge distillation can markedly enhance the efficiency and applicability of large language models in real-world scenarios. The discussion encompasses the varied applications of KD across multiple domains, including industrial systems, embedded systems, Natural Language Processing (NLP), multi-modal processing, and vertical domains, such as medicine, law, science, finance, and materials science. This survey outlines current KD methodologies and future research directions, highlighting its role in advancing AI technologies and fostering innovation across different sectors.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/188896