Native Language Identification Using Syntactic Features and Machine Learning Models
Dr. Prasad A. Joshi
Paper Contents
Abstract
Native Language Identification (NLI) is the task of determining a writers or speakers native language (L1) based on their production in a second language (L2). This study investigates the use of syntactic features extracted from publicly available learner corpora and evaluates four classical machine learning modelsSupport Vector Machine (SVM), Nave Bayes (NB), Logistic Regression (LR), and Random Forest Classifier (RFC). The models are assessed using accuracy, F1-score, and confusion matrices. Results indicate that syntactic structures, such as POS patterns and dependency relations, provide strong predictive cues for NLI, with SVM delivering the highest performance.
Copyright
Copyright © 2025 Dr. Prasad A. Joshi. This is an open access article distributed under the Creative Commons Attribution License.