Presenter/Author Information

A. Mäkinen
A. Kangas
T. Tokola

Keywords

forest planning, data mining, outlier detection

Start Date

1-7-2008 12:00 AM

Description

Decision making in forest planning is based mostly on simulated forestmanagement scenarios. A fundamental tool in creating these scenarios is the forest planningsystem, which utilizes a set of models for projecting the future development of forests andassessing the effects of alternative management tasks, such as timber harvests [Burkhart2003]. Input data for forest planning is obtained from several different sources, such asremote sensing and field measurements and visual assessments. All data collection systemsinclude errors, both due to human and technical sources, which eventually affect the qualityof forest plans. Part of the errors can be considered as next to impossible to detect but partof the errors are outliers and can be separated from the data. Statistical outlier detectionmethods have been used in data processing, although the statistical methods do not workwell for multi-dimensional data. Data mining offers some interesting possibilities for theoutlier detection task. The different data mining schemes for outlier detection include forexample distance-, density- and clustering –based algorithms that have been proven to workwith multi-dimensional data. In the field of forest research, data mining methods have notbeen studied almost at all. In this study we compared three different outlier detectionschemes for finding the outliers in a large forest inventory data. The tested algorithms wereNested-Loop distance-based outlier detection [Knorr and Ng 1998], Simple-Pruningdistance-based outlier detection [Bay and Schwabacher 2003] and Outlier RemovalClustering [Hautamäki et al. 2005]. The data included a total of 5090 field measuredsample plots on 578 forest stands with a number of stand-level aggregate attributesrepresenting different characteristics of the growing stock, the forest site and thesurrounding region. Each of the examined methods has a number of parameters having astrong effect on the outlier detection result. Also the selection of the attributes which wereused in the outlier detection strongly affected the results. None of the three methods provedto be superior compared to the others in finding the outliers. The large natural variation inthe forest attribute values made the task of separating the outliers difficult. However, theexamined data mining methods showed very promising results in finding outliers.

Share

COinS
 
Jul 1st, 12:00 AM

Applying Data Mining Methods for Forest Planning Data Validation

Decision making in forest planning is based mostly on simulated forestmanagement scenarios. A fundamental tool in creating these scenarios is the forest planningsystem, which utilizes a set of models for projecting the future development of forests andassessing the effects of alternative management tasks, such as timber harvests [Burkhart2003]. Input data for forest planning is obtained from several different sources, such asremote sensing and field measurements and visual assessments. All data collection systemsinclude errors, both due to human and technical sources, which eventually affect the qualityof forest plans. Part of the errors can be considered as next to impossible to detect but partof the errors are outliers and can be separated from the data. Statistical outlier detectionmethods have been used in data processing, although the statistical methods do not workwell for multi-dimensional data. Data mining offers some interesting possibilities for theoutlier detection task. The different data mining schemes for outlier detection include forexample distance-, density- and clustering –based algorithms that have been proven to workwith multi-dimensional data. In the field of forest research, data mining methods have notbeen studied almost at all. In this study we compared three different outlier detectionschemes for finding the outliers in a large forest inventory data. The tested algorithms wereNested-Loop distance-based outlier detection [Knorr and Ng 1998], Simple-Pruningdistance-based outlier detection [Bay and Schwabacher 2003] and Outlier RemovalClustering [Hautamäki et al. 2005]. The data included a total of 5090 field measuredsample plots on 578 forest stands with a number of stand-level aggregate attributesrepresenting different characteristics of the growing stock, the forest site and thesurrounding region. Each of the examined methods has a number of parameters having astrong effect on the outlier detection result. Also the selection of the attributes which wereused in the outlier detection strongly affected the results. None of the three methods provedto be superior compared to the others in finding the outliers. The large natural variation inthe forest attribute values made the task of separating the outliers difficult. However, theexamined data mining methods showed very promising results in finding outliers.