Keywords

Data mining; Descriptor variable; Drought; Merit value; Modeling

Start Date

27-6-2018 2:00 PM

End Date

27-6-2018 3:20 PM

Abstract

Attribute filtering is the process of objectively searching the best subset of attributes from available multiple attribute options for improved environmental modeling. The main goal of this research was to develop an automated data-mining attribute filtering approach for objective-based feature selection in environmental modeling. Four attribute selection algorithms (correlation-based attribute selection, principal component-based attribute selection, relief attribute evaluation, and wrapper subset evaluation) were compared for their best performances. For the experimental analysis, data from climate, satellite, biophysical, oceanic and atmospheric interactions for modeling drought related environmental hazards were used. Attribute merit values for modeling the target dependent attributes were determined and compared with possible alternative attribute for objective based feature selections. The average merit values for the selected attributes were also ranked. The average merit values for the selected attributes ranged from 0.5 to 0.9 for the case study conducted. This research is complementary to the extensive review and common sense in identifying relevant attributes for a given domain; and it does not mean that the researchers have not to use their common sense and check with established truth or theory bases in the domain specific research. The methodology developed here helps to avoid the uncertainty of domain experts’ attribute selection challenges, which are usually unsystematic and dominated by somewhat arbitrary trials. Future research may evaluate the developed methodology using relevant classification techniques (such as classification and regression tree or random forest) and quantify the actual information gain from the developed approach.

Stream and Session

Stream B: (Big) Data Solutions for Planning, Management, and Operation and Environmental Systems

B3: Sixth Session on Data Mining as a Tool for Environmental Scientists (S-DMTES-2018)

Organizers: Karina Gibert (fiEMSs), Miquel Sànchez-Marrè (fiEMSs), Ioannis Athanasiadis, Geoff Holmes

COinS
 
Jun 27th, 2:00 PM Jun 27th, 3:20 PM

Data Mining Attribute Filtering Approach for Drought Environmental Hazard Modeling

Attribute filtering is the process of objectively searching the best subset of attributes from available multiple attribute options for improved environmental modeling. The main goal of this research was to develop an automated data-mining attribute filtering approach for objective-based feature selection in environmental modeling. Four attribute selection algorithms (correlation-based attribute selection, principal component-based attribute selection, relief attribute evaluation, and wrapper subset evaluation) were compared for their best performances. For the experimental analysis, data from climate, satellite, biophysical, oceanic and atmospheric interactions for modeling drought related environmental hazards were used. Attribute merit values for modeling the target dependent attributes were determined and compared with possible alternative attribute for objective based feature selections. The average merit values for the selected attributes were also ranked. The average merit values for the selected attributes ranged from 0.5 to 0.9 for the case study conducted. This research is complementary to the extensive review and common sense in identifying relevant attributes for a given domain; and it does not mean that the researchers have not to use their common sense and check with established truth or theory bases in the domain specific research. The methodology developed here helps to avoid the uncertainty of domain experts’ attribute selection challenges, which are usually unsystematic and dominated by somewhat arbitrary trials. Future research may evaluate the developed methodology using relevant classification techniques (such as classification and regression tree or random forest) and quantify the actual information gain from the developed approach.