Keywords

variable selection; feature refinement; Random Forests; cellular automata; land use change

Start Date

6-7-2022 1:40 PM

End Date

6-7-2022 2:00 PM

Abstract

Cellular Automata (CA) are a commonly used type of model to simulate future land use/cover (LULC) change by treating the landscape as a pixelated abstraction of cells of different LULC classes. These models predict the potential for change at a cellular level based upon statistically modelled relationships between historically observed class-class transitions and environmental, socio-economic and neighbourhood predictor variables. Over the last two decades the diversity and complexity of the techniques used for this transition modelling within CAs has grown considerably. However, regardless of the technique chosen, an area that is still given insufficient attention is the process of variable selection necessary to produce accurate, yet parsimonious, predictive models. This is partly due to the fact that simulating change in even a small number of LULC classes necessitates modelling a large number of transitions, alongside the possibility of different realisations of neighbourhood variables (focal extent and decay rate) which makes for a time-consuming process. This research contributes to this issue by presenting a novel two-stage variable selection procedure in the context of regionalized LULC transition modelling for Switzerland, utilizing the Random Forests (RF) algorithm. The first stage represents a filter method whereby predictors are grouped categorically, ranked using univariate regression models and then thinned on the basis of pairwise correlations. Following this, a model embedded approach, Guided Regularized RF, is used to reduce feature redundancy. We demonstrate how this approach not only provides parsimonious models, but is an efficient method for high dimensional calibration of neighbourhood predictors. Furthermore, when applied across spatial and temporal scales, the method provides useful insights into non-stationarity that can inform the development of the CA model in which the models are incorporated.

Stream and Session

false

Share

COinS
 
Jul 6th, 1:40 PM Jul 6th, 2:00 PM

Combining filter and embedded approaches to improve variable selection in land use change Cellular Automata models using Random Forests.

Cellular Automata (CA) are a commonly used type of model to simulate future land use/cover (LULC) change by treating the landscape as a pixelated abstraction of cells of different LULC classes. These models predict the potential for change at a cellular level based upon statistically modelled relationships between historically observed class-class transitions and environmental, socio-economic and neighbourhood predictor variables. Over the last two decades the diversity and complexity of the techniques used for this transition modelling within CAs has grown considerably. However, regardless of the technique chosen, an area that is still given insufficient attention is the process of variable selection necessary to produce accurate, yet parsimonious, predictive models. This is partly due to the fact that simulating change in even a small number of LULC classes necessitates modelling a large number of transitions, alongside the possibility of different realisations of neighbourhood variables (focal extent and decay rate) which makes for a time-consuming process. This research contributes to this issue by presenting a novel two-stage variable selection procedure in the context of regionalized LULC transition modelling for Switzerland, utilizing the Random Forests (RF) algorithm. The first stage represents a filter method whereby predictors are grouped categorically, ranked using univariate regression models and then thinned on the basis of pairwise correlations. Following this, a model embedded approach, Guided Regularized RF, is used to reduce feature redundancy. We demonstrate how this approach not only provides parsimonious models, but is an efficient method for high dimensional calibration of neighbourhood predictors. Furthermore, when applied across spatial and temporal scales, the method provides useful insights into non-stationarity that can inform the development of the CA model in which the models are incorporated.