Keywords

IBMWP; surface waters; incomplete datasets; model comparison

Location

Session C1: VI Data Mining for Environmental Sciences Session

Start Date

12-7-2016 9:30 AM

End Date

12-7-2016 9:50 AM

Abstract

This paper reports on a comparison of different techniques to handle missing data in a real environmental dataset containing missing values. In particular, we handled a dataset related to a surface water quality index. The chosen techniques were regression imputation by linear regression, a model tree and a Bayesian network; multiple imputation by chained equations; and data augmentation by a Bayesian network. The models were tested by analyzing the predictive maps and by comparing the density function of the predicted variable with the observed one. The experimental results showed that the imputation by linear regression and the multiple imputation by chained equations maintained the characteristics of the response variable more than the remaining models.

COinS
 
Jul 12th, 9:30 AM Jul 12th, 9:50 AM

An Experimental Comparison of Methods to Handle Missing Values in Environmental Datasets

Session C1: VI Data Mining for Environmental Sciences Session

This paper reports on a comparison of different techniques to handle missing data in a real environmental dataset containing missing values. In particular, we handled a dataset related to a surface water quality index. The chosen techniques were regression imputation by linear regression, a model tree and a Bayesian network; multiple imputation by chained equations; and data augmentation by a Bayesian network. The models were tested by analyzing the predictive maps and by comparing the density function of the predicted variable with the observed one. The experimental results showed that the imputation by linear regression and the multiple imputation by chained equations maintained the characteristics of the response variable more than the remaining models.