Keywords
model uncertainty, prediction interval, fuzzy clustering
Start Date
1-7-2006 12:00 AM
Abstract
This paper presents a novel method for estimating “total” predictive uncertainty using machinelearning techniques. By the term “total” we mean that all sources of uncertainty are taken into account, includingthat of the input and observed data, model parameters and structure, without attempting to separate thecontribution given by these different sources. We assume that the model error, which is mismatch between theobserved and modelled value reflects all sources of uncertainty. Fuzzy c-means clustering was employed tocluster the input space into different zones or clusters assuming that the all the examples those belong to theparticular cluster have similar model errors. The prediction interval is constructed for each cluster on the basis ofempirical distributions of the historical model errors associated with all examples of the particular cluster.Prediction interval for the individual example is derived from cluster based prediction interval according to theirmembership grades in each cluster. Linear or non-linear regression model is then built in calibration data thatapproximates an underlying functional relationship between an input vector and the computed predictionintervals. Finally, this model is applied to estimate the prediction intervals in verification data. The method wastested on hydrologic datasets using various machine learning techniques. Preliminary results show that themethod has certain advantage if compared to other methods.
A Novel Method to Estimate the Model Uncertainty Based on the Model Errors
This paper presents a novel method for estimating “total” predictive uncertainty using machinelearning techniques. By the term “total” we mean that all sources of uncertainty are taken into account, includingthat of the input and observed data, model parameters and structure, without attempting to separate thecontribution given by these different sources. We assume that the model error, which is mismatch between theobserved and modelled value reflects all sources of uncertainty. Fuzzy c-means clustering was employed tocluster the input space into different zones or clusters assuming that the all the examples those belong to theparticular cluster have similar model errors. The prediction interval is constructed for each cluster on the basis ofempirical distributions of the historical model errors associated with all examples of the particular cluster.Prediction interval for the individual example is derived from cluster based prediction interval according to theirmembership grades in each cluster. Linear or non-linear regression model is then built in calibration data thatapproximates an underlying functional relationship between an input vector and the computed predictionintervals. Finally, this model is applied to estimate the prediction intervals in verification data. The method wastested on hydrologic datasets using various machine learning techniques. Preliminary results show that themethod has certain advantage if compared to other methods.