Keywords

Machine learning; deforestation; freely available datasets

Location

Session C1: VI Data Mining for Environmental Sciences Session

Start Date

12-7-2016 11:50 AM

End Date

12-7-2016 12:10 PM

Abstract

Deforestation remains a major environmental issue and is studied for a range of reasons such as informing policy decisions, predicting at risk areas and evaluating interventions. Despite there being a range of machine learning (ML) techniques with a proven record in complicated, non-linear problems, many of these are rarely applied to deforestation analysis. We propose that this is partially due to uncertainty in the environmental services field regarding how these models perform compared to standard statistics. There is also a lack of guidance on which situations various models are suitable for. We compared the three ML techniques of artificial neural networks (ANNs), Bayesian networks (BNs) and Gaussian Processes (GP’s) against classical generalised linear models (GLMs) and generalised linear mixed models (GLMMs). Each technique was evaluated using several performance metrics as well as being assessed on their suitability for meeting three core objective requirements of deforestation studies; predicting location, predicting quantity and identifying predisposing factors. Constraints such as implementation time and difficulty were also considered. The datasets used for model training were restricted to freely available or low cost datasets to allow evaluation of their potential usefulness. All models were able to provide good general predictions of the location of deforestation. None of the techniques implemented using the selected datasets were suitable for directly predicting the amount of deforestation, however the GLMMs and BNs were useful in predicting deforestation risk and assessing the relative importance of deforestation predictors. GPs performed well when few deforestation predictors were available and the ANNs in some instances outperformed the GLMMs. The available resources were found to be a major influence when deciding which techniques are suitable for a given study.

COinS
 
Jul 12th, 11:50 AM Jul 12th, 12:10 PM

Using Machine Learning to Make the Most out of Free Data: A Deforestation Case Study

Session C1: VI Data Mining for Environmental Sciences Session

Deforestation remains a major environmental issue and is studied for a range of reasons such as informing policy decisions, predicting at risk areas and evaluating interventions. Despite there being a range of machine learning (ML) techniques with a proven record in complicated, non-linear problems, many of these are rarely applied to deforestation analysis. We propose that this is partially due to uncertainty in the environmental services field regarding how these models perform compared to standard statistics. There is also a lack of guidance on which situations various models are suitable for. We compared the three ML techniques of artificial neural networks (ANNs), Bayesian networks (BNs) and Gaussian Processes (GP’s) against classical generalised linear models (GLMs) and generalised linear mixed models (GLMMs). Each technique was evaluated using several performance metrics as well as being assessed on their suitability for meeting three core objective requirements of deforestation studies; predicting location, predicting quantity and identifying predisposing factors. Constraints such as implementation time and difficulty were also considered. The datasets used for model training were restricted to freely available or low cost datasets to allow evaluation of their potential usefulness. All models were able to provide good general predictions of the location of deforestation. None of the techniques implemented using the selected datasets were suitable for directly predicting the amount of deforestation, however the GLMMs and BNs were useful in predicting deforestation risk and assessing the relative importance of deforestation predictors. GPs performed well when few deforestation predictors were available and the ANNs in some instances outperformed the GLMMs. The available resources were found to be a major influence when deciding which techniques are suitable for a given study.