Presenter/Author Information

Can-Tao Liu
Bao-Gang Hu

Start Date

1-7-2010 12:00 AM

Abstract

Recently, data-driven approaches including machine-learning (ML) techniqueshave played a key role in the research on ecological data and models. One of the mostimportant steps in the application of a ML technique is the selection of significant modelinput variables. Among ML methods, artificial neural networks and genetic algorithm arewidely used for the sake of the above aim; however entropy-based learning methods havenot been well adopted in the field of selecting the significant input variables for ecologicalmodel. In this paper, we utilize Renyi’s entropy to estimate mutual information, and thencompute maximum relevance and minimum redundancy of the input variables by themutual information for selecting a compact input subset. This work is a case on forest covertype dataset obtained from US Forest Service Region 2 Resource Information System. Adetailed analysis of the whole discrete variables of the dataset for their much redundancywas made. First we fully understand the amount of information of these features and theirrelevance and redundancy. Then we study which features are more important for forestcover type with feature selection method based on mutual information. The results show thediscrete attributes of the dataset contain little effective information, with much redundancy.Only 17 variables of 44 attributes are kept for discrete values. The method proposed in thepaper is helpful to make the good decision and measuring due to increased datatransparency in ecological informatics. In all, by utilizing information theory as themathematical infrastructure, the new view to study ecological data can be acquired.

COinS
 
Jul 1st, 12:00 AM

Renyi's-entropy-based Approach for Selecting the Significant Input Variables for the Ecological data

Recently, data-driven approaches including machine-learning (ML) techniqueshave played a key role in the research on ecological data and models. One of the mostimportant steps in the application of a ML technique is the selection of significant modelinput variables. Among ML methods, artificial neural networks and genetic algorithm arewidely used for the sake of the above aim; however entropy-based learning methods havenot been well adopted in the field of selecting the significant input variables for ecologicalmodel. In this paper, we utilize Renyi’s entropy to estimate mutual information, and thencompute maximum relevance and minimum redundancy of the input variables by themutual information for selecting a compact input subset. This work is a case on forest covertype dataset obtained from US Forest Service Region 2 Resource Information System. Adetailed analysis of the whole discrete variables of the dataset for their much redundancywas made. First we fully understand the amount of information of these features and theirrelevance and redundancy. Then we study which features are more important for forestcover type with feature selection method based on mutual information. The results show thediscrete attributes of the dataset contain little effective information, with much redundancy.Only 17 variables of 44 attributes are kept for discrete values. The method proposed in thepaper is helpful to make the good decision and measuring due to increased datatransparency in ecological informatics. In all, by utilizing information theory as themathematical infrastructure, the new view to study ecological data can be acquired.