Indian Regional Speech Recognition using Open AIs Whisper Multilingual Speech Recognition Engine Based on Generative Pretrained Transformer Architecture
Aditi Bora Bora
Paper Contents
Abstract
In this paper, we conducted model finetuning on OpenAI's Whisper for Various Indian Regional languages, enabling Whisper to generate Hindi, Bengali, Marathi, Tamil, Telugu, and Kannada text outputs. We employed Hugging Face's official Whisper models, namely base and small, and their finetuning methodology. Additionally, we utilized the Hindi, Bengali, Marathi, Tamil, Telugu, and Kannada dataset from Youtube , Audio Voice and collected around 100 audio clips of Bollywood actors and Indian Personalities Speech along with their subtitle files from the internet.
Copyright
Copyright © 2025 Aditi Bora. This is an open access article distributed under the Creative Commons Attribution License.