Paper Contents
Abstract
Speaker recognition is the process by which a system identifies or verifies a person based on their voice. Recent advancements in machine learning, especially deep learning, have greatly improved the accuracy of these systems. This work introduces a model called the Five Convolutional Blocks-CNN (5C-CNN), designed to identify speakers from audio recordings. The model uses multiple layers to capture unique voice features from visual representations of sound called spectrograms. Additionally, the combination of different machine learning techniques helps in managing challenges like overlapping voices. This approach significantly improves speaker recognition accuracy, especially when compared to traditional methods. The goal of this study is to find an efficient and affordable solution to accurately separate and recognize voices using advanced methods.
Copyright
Copyright © 2024 Roshitha Basuru. This is an open access article distributed under the Creative Commons Attribution License.