Keywords

Input variable selection; Data-driven modelling; Environmental datasets

Location

Session H2: Water Resources Management and Planning - Modeling and Software for Improving Dcisions and Engaging Stakeholders

Start Date

17-6-2014 3:40 PM

End Date

17-6-2014 5:20 PM

Abstract

Input variable selection is an essential step in the development of statistical models and is particularly relevant in environmental modelling, where potential model inputs often consist of time lagged values of each different potential input variable. While new methods for identifying important model inputs continue to emerge, each has its own advantages and limitations and no method is best suited to all datasets and purposes. Nevertheless, rigorous evaluation of new and existing input variable selection methods, is largely neglected due to the lack of guidelines or precedent to facilitate consistent and standardised assessment. This rigorous evaluation would allow the effectiveness of these algorithms to be properly identified in various circumstances. In this paper, we propose a new framework for the evaluation of input variable selection methods which takes into account a wide range of dataset properties that are relevant to real world environmental data and assessment criteria selected to highlight algorithm suitability in different situations of interest. The framework is supported by a repository of datasets to enable standardised and statistically significant testing. It is hoped that this framework helps to promote the appropriate application and comparison of input variable selection algorithms and eventually serves to provide guidance as to which algorithm is most suitable in a given situation.

COinS
 
Jun 17th, 3:40 PM Jun 17th, 5:20 PM

A new evaluation framework for input variable selection algorithms used in environmental modelling

Session H2: Water Resources Management and Planning - Modeling and Software for Improving Dcisions and Engaging Stakeholders

Input variable selection is an essential step in the development of statistical models and is particularly relevant in environmental modelling, where potential model inputs often consist of time lagged values of each different potential input variable. While new methods for identifying important model inputs continue to emerge, each has its own advantages and limitations and no method is best suited to all datasets and purposes. Nevertheless, rigorous evaluation of new and existing input variable selection methods, is largely neglected due to the lack of guidelines or precedent to facilitate consistent and standardised assessment. This rigorous evaluation would allow the effectiveness of these algorithms to be properly identified in various circumstances. In this paper, we propose a new framework for the evaluation of input variable selection methods which takes into account a wide range of dataset properties that are relevant to real world environmental data and assessment criteria selected to highlight algorithm suitability in different situations of interest. The framework is supported by a repository of datasets to enable standardised and statistically significant testing. It is hoped that this framework helps to promote the appropriate application and comparison of input variable selection algorithms and eventually serves to provide guidance as to which algorithm is most suitable in a given situation.