Keywords
Sensor data validation/imputation; Time series; Predictor Models Ensemble; Meta-Predictor;Intelligent Decision Support Systems
Start Date
7-7-2022 12:00 PM
End Date
7-7-2022 12:20 PM
Abstract
Intelligent Decision Support Systems (IDSS) integrate different Artificial Intelligence (AI) techniques with the aim of taking or supporting human-like decisions. To this end, these techniques are based on the available data from the target process. This implies that invalid or missing data could trigger incorrect decisions and therefore, undesirable situations in the supervised process. This is even more important in environmental systems. In data-driven applications such as IDSS, data quality is a basal problem that should be addressed for the sake of the overall systems’ performance. In this paper, a data validation and imputation methodology for time-series is presented. This methodology is integrated in an IDSS software tool which generates suitable control set-points to control the process. The data validation and imputation approach presented here is based on an ensemble of different prediction models obtained for the sensors involved in the process. A Case-Based Reasoning (CBR) approach is used for data imputation, i.e., similar past situations to the current one can propose new values for the missing ones. The CBR model is complemented with other prediction models such as autoregressive (AR) models or artificial neural network (ANN) models. Then, the different obtained predictions are ensembled to obtain a better prediction performance than the obtained by each individual prediction model separately. Furthermore, the use of a meta-prediction model, trained using the predictions of all individual models as inputs, is proposed and compared with other ensemble methods to validate its performance. Finally, this approach is illustrated in a real Waste Water Treatment Plant (WWTP) case study using a selection of the most important sensors available and considering real faults, showing promising results with improved performance when using the ensemble approach presented here compared against the prediction obtained by each individual model separately
Ensemble model-based method for time series sensors’ datavalidation and imputation applied to a real Waste Water TreatmentPlant
Intelligent Decision Support Systems (IDSS) integrate different Artificial Intelligence (AI) techniques with the aim of taking or supporting human-like decisions. To this end, these techniques are based on the available data from the target process. This implies that invalid or missing data could trigger incorrect decisions and therefore, undesirable situations in the supervised process. This is even more important in environmental systems. In data-driven applications such as IDSS, data quality is a basal problem that should be addressed for the sake of the overall systems’ performance. In this paper, a data validation and imputation methodology for time-series is presented. This methodology is integrated in an IDSS software tool which generates suitable control set-points to control the process. The data validation and imputation approach presented here is based on an ensemble of different prediction models obtained for the sensors involved in the process. A Case-Based Reasoning (CBR) approach is used for data imputation, i.e., similar past situations to the current one can propose new values for the missing ones. The CBR model is complemented with other prediction models such as autoregressive (AR) models or artificial neural network (ANN) models. Then, the different obtained predictions are ensembled to obtain a better prediction performance than the obtained by each individual prediction model separately. Furthermore, the use of a meta-prediction model, trained using the predictions of all individual models as inputs, is proposed and compared with other ensemble methods to validate its performance. Finally, this approach is illustrated in a real Waste Water Treatment Plant (WWTP) case study using a selection of the most important sensors available and considering real faults, showing promising results with improved performance when using the ensemble approach presented here compared against the prediction obtained by each individual model separately
Stream and Session
false