Keywords
Input variable selection; Data-driven modelling; Environmental datasets
Location
Session H2: Water Resources Management and Planning - Modeling and Software for Improving Dcisions and Engaging Stakeholders
Start Date
17-6-2014 3:40 PM
End Date
17-6-2014 5:20 PM
Abstract
Input variable selection is an essential step in the development of statistical models and is particularly relevant in environmental modelling, where potential model inputs often consist of time lagged values of each different potential input variable. While new methods for identifying important model inputs continue to emerge, each has its own advantages and limitations and no method is best suited to all datasets and purposes. Nevertheless, rigorous evaluation of new and existing input variable selection methods, is largely neglected due to the lack of guidelines or precedent to facilitate consistent and standardised assessment. This rigorous evaluation would allow the effectiveness of these algorithms to be properly identified in various circumstances. In this paper, we propose a new framework for the evaluation of input variable selection methods which takes into account a wide range of dataset properties that are relevant to real world environmental data and assessment criteria selected to highlight algorithm suitability in different situations of interest. The framework is supported by a repository of datasets to enable standardised and statistically significant testing. It is hoped that this framework helps to promote the appropriate application and comparison of input variable selection algorithms and eventually serves to provide guidance as to which algorithm is most suitable in a given situation.
Included in
Civil Engineering Commons, Data Storage Systems Commons, Environmental Engineering Commons, Other Civil and Environmental Engineering Commons
A new evaluation framework for input variable selection algorithms used in environmental modelling
Session H2: Water Resources Management and Planning - Modeling and Software for Improving Dcisions and Engaging Stakeholders
Input variable selection is an essential step in the development of statistical models and is particularly relevant in environmental modelling, where potential model inputs often consist of time lagged values of each different potential input variable. While new methods for identifying important model inputs continue to emerge, each has its own advantages and limitations and no method is best suited to all datasets and purposes. Nevertheless, rigorous evaluation of new and existing input variable selection methods, is largely neglected due to the lack of guidelines or precedent to facilitate consistent and standardised assessment. This rigorous evaluation would allow the effectiveness of these algorithms to be properly identified in various circumstances. In this paper, we propose a new framework for the evaluation of input variable selection methods which takes into account a wide range of dataset properties that are relevant to real world environmental data and assessment criteria selected to highlight algorithm suitability in different situations of interest. The framework is supported by a repository of datasets to enable standardised and statistically significant testing. It is hoped that this framework helps to promote the appropriate application and comparison of input variable selection algorithms and eventually serves to provide guidance as to which algorithm is most suitable in a given situation.