Sentiment Analysis of Social Media Posts : Comparative Evaluation of Naive Bayes and Logistic Regression Classifiers
Insiyah Udaipurwala
Paper Contents
Abstract
The proliferation of digital communication, particularly through social media platforms like YouTube and Instagram, has established massive, real-time repositories of unstructured textual opinions. Analyzing this data via Sentiment Analysis (SA) is crucial for stakeholders across commercial, political, and public health sectors. This research addresses the task of multi-class (tertiary: Positive, Negative, Neutral) sentiment classification on complex social media text, which is characterized by linguistic noise and severe class imbalance. The study evaluates the performance and efficiency of two foundational linear classifiers: Multinomial Naive Bayes (MNB) and Multinomial Logistic Regression (LR). A publicly available, pre-labeled Kaggle dataset of social media comments was preprocessed using a robust Natural Language Toolkit (NLTK) pipeline, including cleaning, lemmatization, and stopword removal, followed by feature vectorization using Term FrequencyInverse Document Frequency (TF-IDF). Evaluation relied on standard metrics, with the Macro F1-Score prioritized to ensure balanced performance across the inherently undersampled sentiment classes. The key findings indicate that LR achieved superior predictive performance with an overall Accuracy of 88.00% and a Macro F1-Score of 0.83. MNB, while faster, lagged in classification rigor, yielding an Accuracy of 86.00% and a Macro F1-Score of 0.78. This statistical advantage is attributed to LRs capacity to employ L2 regularization, which effectively manages overfitting in the sparse, high-dimensional feature space created by TF-IDF, mitigating the restrictive independence assumption inherent to MNB. However, MNB demonstrated significantly faster training speed (0.52 seconds compared to LR's 1.87 seconds), establishing a critical performance-efficiency trade-off for real-time deployment considerations
Copyright
Copyright © 2025 Insiyah Udaipurwala. This is an open access article distributed under the Creative Commons Attribution License.