Paper Contents
Abstract
Abstract:With reinforcement learning powered by big data and computer infrastructure, data-centric AI is driving a fundamental shift in the way software is developed. To treat data as a first-class citizen on par with code, software engineering must be rethought in this situation. One surprise finding is how much time is spent on data preparation throughout the machine learning process. Even the most powerful machine learning algorithms will struggle to perform adequately in the absence of high-quality data. Advanced technologies that are data-centric are being used more frequently as a result. Unfortunately, a lot of real-world datasets are small, unclean, biased. In this paper, we focus on the scientific community for data collection and data quality for deep learning applications. Data collection is essential since modern algorithms for deep learning rely more on large-scale data collection than classification techniques. To enhance data quality, we investigate data validation, cleaning, and integration techniques. Even if the data cannot be completely cleaned, robust model training strategies enable us to work with imperfect data during training the model. Furthermore, despite the fact that these issues have gotten less attention in conventional data management studies, bias and fairness are significant themes in modern applications of machine learning. In order to prevent injustice, we investigate controls for fairness and strategies for doing so before, during, and after model training. Keywords: Artificial Intelligence, Data Cleaning, Data Validation, Robustness Techniques, Machine Learning
Copyright
Copyright © 2025 Padakanti Mahesh. This is an open access article distributed under the Creative Commons Attribution License.