Paper Contents
Abstract
As digital communication evolves, spammers are adopting more sophisticated techniques to bypass traditional email filters, including the use of audio attachments to deliver spam content. This project presents a novel approach to detect spam in email audio attachments, addressing a growing gap in conventional text-based spam detection systems.The system connects to a user's Gmail inbox using the IMAP protocol and scans incoming emails for audio files in formats such as .mp3, .wav, and .m4a. Once an audio attachment is found, it is converted to a .wav format using audio processing libraries like pydub to ensure compatibility with speech recognition tools. The converted audio is then transcribed into text using Googles Speech Recognition API. The transcribed text is scanned for common spam keywords such as "lottery", "money", and "win". If such keywords are detected, the email is flagged as spam.The project highlights the need for multimodal email security systems and proposes future enhancements such as integrating machine learning models, supporting multilingual audio, and improving noise filtering for better accuracy. This work demonstrates a practical and innovative step toward combating voice-based email spam threats and enhancing overall email security.
Copyright
Copyright © 2025 Anjali Sharma, Ankur Jain. This is an open access article distributed under the Creative Commons Attribution License.