Playing Tetris using the Brain-Computer Interface allows you to anticipate the winner.
Ms. Vaishali Bajpai Vaishali Bajpai
Paper Contents
Abstract
The paper focuses on the development and implementation of a system for the detection of duplicate question pairs, using Machine Learning and Natural Language Processing techniques. Given the proliferation of forums and Q&A sites in the Internet Age, efficient ways to detect the same questions are crucially important for the quality and usability of such platforms. The goal of the paper is to devise a model that identifies correctly whether the semantic equivalence of the two input questions is correct. Various techniques in NLP are applied in preprocessing the text data, which includes tokenization, stemming, lemmatization, and finally vectorization using methods such as TF-IDF. Besides basic text preprocessing, some advanced features are extracted, which includes n-grams and cosine similarity, and keyword extraction. We further enrich our feature set by using the Fuzzy Wuzzy library to develop similarity ratios for question pairs. We further develop different models with Logistic Regression, Support Vector Machines, Random Forest, and Gradient Boosting. The paper performs a rather detailed comparison between all of these models to come up with the best one. These evaluation metrics will include accuracy, precision, recall, and the F1-score. Furthermore, tuning hyperparameters and cross-validation are part of the whole process for model performance optimization.Keywords:Natural Language Processing (NLP), Machine Learning, Fuzzy Wuzzy Library, Text Preprocessing, Feature Engineering, Logistic Regression, Support Vector Machines (SVM), Random Forest, Gradient Boosting
Copyright
Copyright © 2024 Ms. Vaishali Bajpai. This is an open access article distributed under the Creative Commons Attribution License.