Development of a Clean Historical Energy Consumption Dataset for Building Energy Studies
Benard Ongere, Prof Charles Ondieki, Dr Henry Kiragu
Paper Contents
Abstract
Reliable energy analysis, simulation, and predictive modeling in buildings depend heavily on the availability of high-quality historical energy consumption data. In practice, raw building energy datasets often contain missing values, inconsistencies, noise, and temporal misalignments that limit their direct applicability for meaningful analysis. This study presents a structured approach to the collection and preprocessing of historical energy consumption data obtained from a single building, with the aim of establishing a reliable and analysis-ready dataset for energy performance evaluation and future modeling tasks. The energy consumption data were collected over an extended monitoring period, allowing the capture of long-term temporal patterns and variations influenced by building operational schedules, occupancy behavior, and routine activities. To improve data reliability and usability, a systematic preprocessing framework was implemented. This framework included data cleaning to remove erroneous and duplicate records, treatment of missing and inconsistent values, detection and handling of outliers, normalization of energy consumption values, and time-series alignment to ensure a consistent temporal resolution. These preprocessing steps were carefully designed to enhance data accuracy, consistency, and integrity while preserving the inherent consumption trends. The resulting preprocessed dataset provides a robust foundation for subsequent energy analysis, simulation, and forecasting applications tailored to the selected building. By focusing on a single-building case study, this work demonstrates a practical, transparent, and replicable methodology for preparing building energy consumption data. The proposed approach supports data-driven building energy management and informed decision-making and can be adapted for similar studies seeking to improve the quality of energy datasets prior to advanced analytical or machine learning applications. Keywords: Building energy consumption, Historical energy data, Data preprocessing, Single-building case study, Time-series analysis, Energy performance analysis
Copyright
Copyright © 2026 Benard Ongere, Prof Charles Ondieki, Dr Henry Kiragu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.