Predicting viral host codon fitness and path shifting through tree-based learning on codon usage biases and genomic characteristics.
- Publisher:
- NATURE PORTFOLIO
- Publication Type:
- Journal Article
- Citation:
- Sci Rep, 2025, 15, (1), pp. 12251
- Issue Date:
- 2025-04-10
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Full metadata record
Field | Value | Language |
---|---|---|
dc.contributor.author | Su, S | |
dc.contributor.author | Ni, Z | |
dc.contributor.author | Lan, T | |
dc.contributor.author | Ping, P | |
dc.contributor.author | Tang, J | |
dc.contributor.author | Yu, Z | |
dc.contributor.author |
Hutvagner, G |
|
dc.contributor.author | Li, J | |
dc.date.accessioned | 2025-06-25T23:10:22Z | |
dc.date.available | 2025-02-20 | |
dc.date.available | 2025-06-25T23:10:22Z | |
dc.date.issued | 2025-04-10 | |
dc.identifier.citation | Sci Rep, 2025, 15, (1), pp. 12251 | |
dc.identifier.issn | 2045-2322 | |
dc.identifier.issn | 2045-2322 | |
dc.identifier.uri | http://hdl.handle.net/10453/187949 | |
dc.description.abstract | Viral codon fitness (VCF) of the host and the VCF shifting has seldom been studied under quantitative measurements, although they could be concepts vital to understand pathogen epidemiology. This study demonstrates that the relative synonymous codon usage (RSCU) of virus genomes together with other genomic properties are predictive of virus host codon fitness through tree-based machine learning. Statistical analysis on the RSCU data matrix also revealed that the wobble position of the virus codons is critically important for the host codon fitness distinction. As the trained models can well characterise the host codon fitness of the viruses, the frequency and other details stored at the leaf nodes of these models can be reliably translated into human virus codon fitness score (HVCF score) as a readout of codon fitness of any virus infecting human. Specifically, we evaluated and compared HVCF of virus genome sequences from human sources and others and evaluated HVCF of SARS-CoV-2 genome sequences from NCBI virus database, where we found no obvious shifting trend in host codon fitness towards human-non-infectious. We also developed a bioinformatics tool to simulate codon-based virus fitness shifting using codon compositions of the viruses, and we found that Tylonycteris bat coronavirus HKU4 related viruses may have close relationship with SARS-CoV-2 in terms of human codon fitness. The finding of abundant synonymous mutations in the predicted codon fitness shifting path also provides new insights for evolution research and virus monitoring in environmental surveillance. | |
dc.format | Electronic | |
dc.language | eng | |
dc.publisher | NATURE PORTFOLIO | |
dc.relation.ispartof | Sci Rep | |
dc.relation.isbasedon | 10.1038/s41598-025-91469-z | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject.mesh | Codon Usage | |
dc.subject.mesh | Humans | |
dc.subject.mesh | Genome, Viral | |
dc.subject.mesh | Machine Learning | |
dc.subject.mesh | SARS-CoV-2 | |
dc.subject.mesh | COVID-19 | |
dc.subject.mesh | Codon | |
dc.subject.mesh | Animals | |
dc.subject.mesh | Genetic Fitness | |
dc.subject.mesh | Computational Biology | |
dc.subject.mesh | Genomics | |
dc.subject.mesh | Host-Pathogen Interactions | |
dc.subject.mesh | Animals | |
dc.subject.mesh | Humans | |
dc.subject.mesh | Codon | |
dc.subject.mesh | Computational Biology | |
dc.subject.mesh | Genomics | |
dc.subject.mesh | Genome, Viral | |
dc.subject.mesh | Host-Pathogen Interactions | |
dc.subject.mesh | Genetic Fitness | |
dc.subject.mesh | Machine Learning | |
dc.subject.mesh | Codon Usage | |
dc.subject.mesh | COVID-19 | |
dc.subject.mesh | SARS-CoV-2 | |
dc.subject.mesh | Codon Usage | |
dc.subject.mesh | Humans | |
dc.subject.mesh | Genome, Viral | |
dc.subject.mesh | Machine Learning | |
dc.subject.mesh | SARS-CoV-2 | |
dc.subject.mesh | COVID-19 | |
dc.subject.mesh | Codon | |
dc.subject.mesh | Animals | |
dc.subject.mesh | Genetic Fitness | |
dc.subject.mesh | Computational Biology | |
dc.subject.mesh | Genomics | |
dc.subject.mesh | Host-Pathogen Interactions | |
dc.title | Predicting viral host codon fitness and path shifting through tree-based learning on codon usage biases and genomic characteristics. | |
dc.type | Journal Article | |
utslib.citation.volume | 15 | |
utslib.location.activity | England | |
pubs.organisational-group | University of Technology Sydney | |
pubs.organisational-group | University of Technology Sydney/Faculty of Engineering and Information Technology | |
pubs.organisational-group | University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering | |
pubs.organisational-group | University of Technology Sydney/UTS Groups | |
pubs.organisational-group | University of Technology Sydney/UTS Groups/Centre for Health Technologies (CHT) | |
utslib.copyright.status | open_access | * |
dc.rights.license | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.date.updated | 2025-06-25T23:10:19Z | |
pubs.issue | 1 | |
pubs.publication-status | Published online | |
pubs.volume | 15 | |
utslib.citation.issue | 1 |
Abstract:
Viral codon fitness (VCF) of the host and the VCF shifting has seldom been studied under quantitative measurements, although they could be concepts vital to understand pathogen epidemiology. This study demonstrates that the relative synonymous codon usage (RSCU) of virus genomes together with other genomic properties are predictive of virus host codon fitness through tree-based machine learning. Statistical analysis on the RSCU data matrix also revealed that the wobble position of the virus codons is critically important for the host codon fitness distinction. As the trained models can well characterise the host codon fitness of the viruses, the frequency and other details stored at the leaf nodes of these models can be reliably translated into human virus codon fitness score (HVCF score) as a readout of codon fitness of any virus infecting human. Specifically, we evaluated and compared HVCF of virus genome sequences from human sources and others and evaluated HVCF of SARS-CoV-2 genome sequences from NCBI virus database, where we found no obvious shifting trend in host codon fitness towards human-non-infectious. We also developed a bioinformatics tool to simulate codon-based virus fitness shifting using codon compositions of the viruses, and we found that Tylonycteris bat coronavirus HKU4 related viruses may have close relationship with SARS-CoV-2 in terms of human codon fitness. The finding of abundant synonymous mutations in the predicted codon fitness shifting path also provides new insights for evolution research and virus monitoring in environmental surveillance.
Please use this identifier to cite or link to this item:
Download statistics for the last 12 months
Not enough data to produce graph