A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation

Jahan, N; Sultana, Z; Chowdhury, F; Ahmed, S; Parvez, MZ; Barua, PD; Chakraborty, S

A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation

Jahan, N Sultana, Z Chowdhury, F Ahmed, S Parvez, MZ Barua, PD Chakraborty, S

Permalink

Publisher:: Springer Nature
Publication Type:: Conference Proceeding
Citation:: Lecture Notes in Networks and Systems, 2023, 700 LNNS, pp. 214-224
Issue Date:: 2023-01-01

Closed Access

	Filename	Description	Size
	978-3-031-33743-7_18.pdf	Published version	387.9 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Jahan, N
dc.contributor.author	Sultana, Z
dc.contributor.author	Chowdhury, F
dc.contributor.author	Ahmed, S
dc.contributor.author	Parvez, MZ
dc.contributor.author	Barua, PD
dc.contributor.author	Chakraborty, S https://orcid.org/0000-0002-0102-5424
dc.date.accessioned	2024-02-27T06:17:32Z
dc.date.available	2024-02-27T06:17:32Z
dc.date.issued	2023-01-01
dc.identifier.citation	Lecture Notes in Networks and Systems, 2023, 700 LNNS, pp. 214-224
dc.identifier.isbn	9783031337420
dc.identifier.issn	2367-3370
dc.identifier.issn	2367-3389
dc.identifier.uri	http://hdl.handle.net/10453/175899
dc.description.abstract	This paper represents an approach to speech-to-text conversion in the Bengali language. In this area, we have found most of the methodologies were focused on other languages rather than Bengali. We started with a novel dataset of 56 unique words from 160 individual subjects was prepared. Then in this paper, we illustrate the approach to increasing accuracy in a speech-to-text over the Bengali language where initially we started with Gated Recurrent Unit(GRU) and Long short-term memory (LSTM) algorithms. During further observation, we found that the output of the GRU failed to give any stable output. So, we moved completely to the LSTM algorithm where we achieved 90% accuracy on an unexplored dataset. Voices of several demographic populations and noises were used to validate the model. In the testing phase, we tried a variety of classes based on their length, complexity, noise, and gender variant. Moreover, we expect that this research will help to develop a real-time Bengali speak-to-text recognition model.
dc.language	en
dc.publisher	Springer Nature
dc.relation.ispartof	Lecture Notes in Networks and Systems
dc.relation.ispartofseries	Lecture Notes in Networks and Systems
dc.relation.isbasedon	10.1007/978-3-031-33743-7_18
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation
dc.type	Conference Proceeding
utslib.citation.volume	700 LNNS
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Civil and Environmental Engineering
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Information, Systems and Modelling
utslib.copyright.status	closed_access	*
dc.date.updated	2024-02-27T06:17:31Z
pubs.publication-status	Published
pubs.volume	700 LNNS

Abstract:

This paper represents an approach to speech-to-text conversion in the Bengali language. In this area, we have found most of the methodologies were focused on other languages rather than Bengali. We started with a novel dataset of 56 unique words from 160 individual subjects was prepared. Then in this paper, we illustrate the approach to increasing accuracy in a speech-to-text over the Bengali language where initially we started with Gated Recurrent Unit(GRU) and Long short-term memory (LSTM) algorithms. During further observation, we found that the output of the GRU failed to give any stable output. So, we moved completely to the LSTM algorithm where we achieved 90% accuracy on an unexplored dataset. Voices of several demographic populations and noises were used to validate the model. In the testing phase, we tried a variety of classes based on their length, complexity, noise, and gender variant. Moreover, we expect that this research will help to develop a real-time Bengali speak-to-text recognition model.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/175899