Recent Papers | Research Publications

Data Quality Challenges for MachineLearning Model

Padakanti Mahesh Mahesh

Download Paper

Paper Contents

Abstract

Abstract:With reinforcement learning powered by big data and computer infrastructure, data-centric AI is driving a fundamental shift in the way software is developed. To treat data as a first-class citizen on par with code, software engineering must be rethought in this situation. One surprise finding is how much time is spent on data preparation throughout the machine learning process. Even the most powerful machine learning algorithms will struggle to perform adequately in the absence of high-quality data. Advanced technologies that are data-centric are being used more frequently as a result. Unfortunately, a lot of real-world datasets are small, unclean, biased. In this paper, we focus on the scientific community for data collection and data quality for deep learning applications. Data collection is essential since modern algorithms for deep learning rely more on large-scale data collection than classification techniques. To enhance data quality, we investigate data validation, cleaning, and integration techniques. Even if the data cannot be completely cleaned, robust model training strategies enable us to work with imperfect data during training the model. Furthermore, despite the fact that these issues have gotten less attention in conventional data management studies, bias and fairness are significant themes in modern applications of machine learning. In order to prevent injustice, we investigate controls for fairness and strategies for doing so before, during, and after model training. Keywords: Artificial Intelligence, Data Cleaning, Data Validation, Robustness Techniques, Machine Learning

Copyright

Paper ID: IJPREMS50900025447

ISSN: 2321-9653

Publisher: ijprems

Abstract
Copyright