Discrete wavelet denoising into mfcc for noise suppressive in automatic speech recognition system

Soe Naing, HM; Hidayat, R; Hartanto, R; Miyanaga, Y

Discrete wavelet denoising into mfcc for noise suppressive in automatic speech recognition system

Soe Naing, HM Hidayat, R Hartanto, R Miyanaga, Y

Permalink

Publisher:: The Intelligent Networks and Systems Society
Publication Type:: Journal Article
Citation:: International Journal of Intelligent Engineering and Systems, 2020, 13, (2), pp. 74-82
Issue Date:: 2020-01-01

Closed Access

	Filename	Description	Size
	2020043008.pdf		511.76 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Soe Naing, HM
dc.contributor.author	Hidayat, R
dc.contributor.author	Hartanto, R
dc.contributor.author	Miyanaga, Y https://orcid.org/0000-0002-2795-2234
dc.date.accessioned	2021-04-26T06:21:26Z
dc.date.available	2021-04-26T06:21:26Z
dc.date.issued	2020-01-01
dc.identifier.citation	International Journal of Intelligent Engineering and Systems, 2020, 13, (2), pp. 74-82
dc.identifier.issn	2185-310X
dc.identifier.issn	2185-3118
dc.identifier.uri	http://hdl.handle.net/10453/148385
dc.description.abstract	Automatic Speech Recognition (ASR) is a challenging task and the most problematic issues being in presence of background noise and substantial variability in speech. Extracting the noise-robust features adjust for speech degradations due to noise effect retained popular issue in recent years. This paper presented a framework for wavelet denoising scheme and analysed the different wavelet families and proper thresholding rule into feature extraction to enhance the performance of ASR system. Gaussian Mixture Model-based Hidden Markov Model (GMM-HMM) and Deep Neural Network (DNN)-HMM are used as the speech recognizer. The recognition performance shows that the noise-robust features are obtained while combining with the wavelet transform denoising into Mel Frequency Cepstral Coefficient (MFCC) on Aurora2 database. The best accuracy is gained by cross entropy DNN-HMM training using denoising with Coiflet wavelet and Rigrsure threshold, which provides 97.54% in 10dB, 93.13% in 5dB, 75.63% in 0dB and 37.29% in-5dB.
dc.language	en
dc.publisher	The Intelligent Networks and Systems Society
dc.relation.ispartof	International Journal of Intelligent Engineering and Systems
dc.relation.isbasedon	10.22266/ijies2020.0430.08
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Discrete wavelet denoising into mfcc for noise suppressive in automatic speech recognition system
dc.type	Journal Article
utslib.citation.volume	13
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2021-04-26T06:21:25Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	13
utslib.citation.issue	2

Abstract:

Automatic Speech Recognition (ASR) is a challenging task and the most problematic issues being in presence of background noise and substantial variability in speech. Extracting the noise-robust features adjust for speech degradations due to noise effect retained popular issue in recent years. This paper presented a framework for wavelet denoising scheme and analysed the different wavelet families and proper thresholding rule into feature extraction to enhance the performance of ASR system. Gaussian Mixture Model-based Hidden Markov Model (GMM-HMM) and Deep Neural Network (DNN)-HMM are used as the speech recognizer. The recognition performance shows that the noise-robust features are obtained while combining with the wavelet transform denoising into Mel Frequency Cepstral Coefficient (MFCC) on Aurora2 database. The best accuracy is gained by cross entropy DNN-HMM training using denoising with Coiflet wavelet and Rigrsure threshold, which provides 97.54% in 10dB, 93.13% in 5dB, 75.63% in 0dB and 37.29% in-5dB.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/148385