Paper Contents
Abstract
This paper presents a comprehensive evaluation of several machine learning models designed to enhance language identification tasks. We compare traditional and advanced models including Naive Bayes, Bi-LSTM, CNN, and BERT, using met- rics such as accuracy, precision, recall, and F1-score to assess their performance. The study reveals that the BERT model, leveraging its transformer-based architecture and self-attention mechanisms, significantly outperforms others by achieving an exceptional accuracy of 99.92%. The robustness of BERT in handling complex linguistic features such as mixed and short text sequences, and its efficacy in processing code-mixed texts and phonetically similar languages, highlight its potential for advanced natural language processing tasks. These findings not only contribute to the theoretical advancements in machine learning but also offer practical insights for implementing effective language identification systems.
Copyright
Copyright © 2025 Poojashree Chandrashekar. This is an open access article distributed under the Creative Commons Attribution License.