Paper Contents
Abstract
Audio deepfakes, generated through advanced speech synthesis and voice conversion techniques, have emerged as a growing threat to information authenticity and security. These artificially created audio clips are often indistinguishable from genuine speech, making them a potential tool for misinformation, fraud, and impersonation. In this work, an efficient audio deepfake detection system is proposed, employing acoustic feature extraction and machine learning classification. Features such as Mel-Frequency Cepstral Coefficients (MFCCs), Chroma, Mel Spectrogram, Zero-Crossing Rate, Spectral Centroid, and Spectral Flatness are extracted from both genuine and manipulated audio samples. The extracted features are used to train a supervised classifier capable of distinguishing between authentic and synthetic voices with high accuracy. It is built with Flask as the backend framework, the system allows users to register, log in, and upload an audio file for authenticity analysis. The proposed system demonstrates its effectiveness in identifying deepfake audio and contributes toward developing secure and reliable digital communication systems.In today's digital world, the rise of artificial intelligence (AI) has made it easier to create fake audio recordings that sound like real people. This technology, known as "deepfake audio," is becoming more common and can be used for harmful purposes such as spreading misinformation, impersonating others, or committing fraud. Because of this, there is a growing need for tools that can detect whether an audio clip is real or fake.The extracted features are compared to a pre-existing dataset that contains both real and fake audio samples. Using a nearest approach based on distance calculations, the system identifies the most similar audio file in the dataset. It then provides the user with the result, indicating whether the audio is likely to be real or fake, along with a percentage score showing the confidence level of the prediction. This system is designed to be lightweight, fast, and accessible, making deepfake detection easier for everyday users without needing deep technical knowledge.Keywords: Deepfake audio, Real and fake audio samples, machine learning, Flask, as Mel-Frequency Cepstral Coefficients (MFCCs), Chroma, Mel Spectrogram, Zero-Crossing Rate, Spectral Centroid, and Spectral Flatness are extracted.
Copyright
Copyright © 2025 Toshika Ninawe. This is an open access article distributed under the Creative Commons Attribution License.