Distraction Detection for Vehicle Drivers Using Deep Learning Techniques: A Review of Advanced Deep Learning Techniques
Yash Pansare Pansare
Paper Contents
Abstract
The analysis of driver drowsiness detection has grown substantially since recent years because researchers con- sider it essential for reducing accidents on the road. The research presents an extensive analysis of drowsiness detection methods while focusing on You Only Look Once (YOLO) object detection as well as its applications in driver monitoring systems. This paper combines research outcomes from various studies to examine present methods while predicting upcoming trends for this technology in preventing broader driver distractions.The presence of driver distraction along with drowsiness forms major road safety hazards that require immediate solutions through advanced monitoring systems. The integration of artifi- cial intelligence and machine learning with sensor technologies enables modern Driver Monitoring Systems to track driver performance by evaluating physiological signals and vehicle data and visual cues. The review investigates two prevailing forms of DMS technology: S-HDx systems based on heart rate and EEG and physiological signals together with vehicular data while V-HDx approaches use visual elements including eye gaze and head pose and facial expression. The preference for vision-based systems drives from CNNs and MobileNetV2 algorithms because they can track complex behavioral indications through non- intrusive implementations.The research explores zero-shot learning through vision- language models (VLMs) because this method presents an effec- tive solution when traditional computer vision requires massive annotated datasets for operation. A CLIP-based framework introduces both thorough framework design and demonstrates its capability to detect distracted driving activities across diverse public datasets in a task-independent manner. Besides the work presents an optimized YOLOv8 model designed specifically for real-time distraction detection operations. The model reaches 99.4% accuracy through its integration of BoTNet for feature extraction and GAM for multi-scale fusion and EIoU training which reduces computational requirements for deployment on constrained resources.
Copyright
Copyright © 2025 Yash Pansare. This is an open access article distributed under the Creative Commons Attribution License.