Keywords
IBMWP; surface waters; incomplete datasets; model comparison
Location
Session C1: VI Data Mining for Environmental Sciences Session
Start Date
12-7-2016 9:30 AM
End Date
12-7-2016 9:50 AM
Abstract
This paper reports on a comparison of different techniques to handle missing data in a real environmental dataset containing missing values. In particular, we handled a dataset related to a surface water quality index. The chosen techniques were regression imputation by linear regression, a model tree and a Bayesian network; multiple imputation by chained equations; and data augmentation by a Bayesian network. The models were tested by analyzing the predictive maps and by comparing the density function of the predicted variable with the observed one. The experimental results showed that the imputation by linear regression and the multiple imputation by chained equations maintained the characteristics of the response variable more than the remaining models.
Included in
Civil Engineering Commons, Data Storage Systems Commons, Environmental Engineering Commons, Hydraulic Engineering Commons, Other Civil and Environmental Engineering Commons
An Experimental Comparison of Methods to Handle Missing Values in Environmental Datasets
Session C1: VI Data Mining for Environmental Sciences Session
This paper reports on a comparison of different techniques to handle missing data in a real environmental dataset containing missing values. In particular, we handled a dataset related to a surface water quality index. The chosen techniques were regression imputation by linear regression, a model tree and a Bayesian network; multiple imputation by chained equations; and data augmentation by a Bayesian network. The models were tested by analyzing the predictive maps and by comparing the density function of the predicted variable with the observed one. The experimental results showed that the imputation by linear regression and the multiple imputation by chained equations maintained the characteristics of the response variable more than the remaining models.