Keywords

model applicability, model performance, regression trees, variable selection

Start Date

1-7-2012 12:00 AM

Abstract

One of the crucial steps when developing models is the selection of appropriate variables. In this research we assessed the impact variable selection on the model performance and model applicability. Regression trees were built to understand the relationship between the ecological water quality and the physical-chemical and hydromorphological variables. Different model parameterizations and three combinations of explanatory variables were used for developing the trees. Once constructed, they were integrated with the water quality model (PEGASE) and used to simulate the future ecological water quality. These simulations were summarized per combination of explanatory variables and compared.Three key messages summarize our conclusions. First, it was confirmed that different parameterizations alter the statistical reliability of the trees produced. Secondly, it was found that statistical reliability of the models remained stable when different combinations of explanatory variables were implemented. The determination coefficient (R²) ranged from 0.68 to 0.86; Kappa statistic (K) ranged from 0.15 and 0.46; and the percentage of Correctly Classified Instances (CCI) from 33 to 59%. Thirdly, when applying the models on an independent dataset consisting of future physical-chemical water quality data, different conclusions may be taken, depending on the combination of variables used.

Share

COinS
 
Jul 1st, 12:00 AM

Selecting relevant predictors: impact of variable selection on model performance, uncertainty and applicability of models in environmental decision making

One of the crucial steps when developing models is the selection of appropriate variables. In this research we assessed the impact variable selection on the model performance and model applicability. Regression trees were built to understand the relationship between the ecological water quality and the physical-chemical and hydromorphological variables. Different model parameterizations and three combinations of explanatory variables were used for developing the trees. Once constructed, they were integrated with the water quality model (PEGASE) and used to simulate the future ecological water quality. These simulations were summarized per combination of explanatory variables and compared.Three key messages summarize our conclusions. First, it was confirmed that different parameterizations alter the statistical reliability of the trees produced. Secondly, it was found that statistical reliability of the models remained stable when different combinations of explanatory variables were implemented. The determination coefficient (R²) ranged from 0.68 to 0.86; Kappa statistic (K) ranged from 0.15 and 0.46; and the percentage of Correctly Classified Instances (CCI) from 33 to 59%. Thirdly, when applying the models on an independent dataset consisting of future physical-chemical water quality data, different conclusions may be taken, depending on the combination of variables used.