Enhancing Multilingual Communication with Machine Learning-Driven Language Identification
Dr. Ranga Swamy Sirisati Ranga Swamy Sirisati
Paper Contents
Abstract
Multilingual identification is a vital component in enabling seamless communication across linguistically diverse populations. This study presents the development of an efficient language identification system leveraging machine learning algorithmsnamely Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Random Forestto accurately detect and classify languages. The system utilizes the N-gram technique to transform raw text into numerical feature vectors, facilitating effective pattern recognition.The dataset comprises samples from Tamil, Hindi, and Marathi, which were preprocessed and partitioned using an 80:20 train-test split. Model performance was evaluated using standard classification metrics: accuracy, precision, recall, and F1-score. Among the models tested, SVM achieved the highest accuracy of 95%, demonstrating its superior ability to generalize across the dataset.To enhance accessibility, the system incorporates a translation module powered by Google Translator, enabling automatic conversion of detected languages into English. Additional methods such as data augmentation and feature selection were applied to improve model robustness and classification precision.The findings suggest that machine learning-based multilingual identification not only enhances text classification and translation tasks but also holds strong potential for real-world applications such as missing person identification and cross-lingual communication within smart city frameworks.Future work will focus on scaling the dataset to include a broader range of languages and dialects, integrating deep learning techniques for improved contextual language understanding, and optimizing the system for real-time performance in large-scale deployments.
Copyright
Copyright © 2025 Dr. Ranga Swamy Sirisati. This is an open access article distributed under the Creative Commons Attribution License.