Keywords

Data mining; Uncertainty reduction; Environmental data processing; Fuzzy systems; Mutual Information; Bayesian inference.

Location

Session C1: VI Data Mining for Environmental Sciences Session

Start Date

13-7-2016 8:30 AM

End Date

13-7-2016 8:50 AM

Abstract

When dealing with complex environmental datasets, it is also difficult to establish the strength of the input-output relation among variables. Correlation analysis may yield a preliminary indication, but is limited to the linear case. Mutual Information (MI) is a more powerful method which can establish input-output dependence regardless of the nature of their interaction. However, to avoid the heavy computational demand of MI, a simple method is presented based on fuzzy clustering and Bayes’ rule. After a preliminary conditioning phase, the data are grouped by fuzzy clustering and approximated with the value of the most relevant centroid. Then the prior and likelihood probabilities are computed by frequentist methods by counting the occurrences of each sample with respect to the precomputed clusters. In this way the MI can be quickly computed, to yield the relative importance of the informative content of each input.

Share

COinS
 
Jul 13th, 8:30 AM Jul 13th, 8:50 AM

ASSESSING INPUT-OUTPUT RELATIONS IN ENVIRONMENTAL DATA BY MEANS OF FUZZY CLUSTERING AND BAYESIAN INFERENCE

Session C1: VI Data Mining for Environmental Sciences Session

When dealing with complex environmental datasets, it is also difficult to establish the strength of the input-output relation among variables. Correlation analysis may yield a preliminary indication, but is limited to the linear case. Mutual Information (MI) is a more powerful method which can establish input-output dependence regardless of the nature of their interaction. However, to avoid the heavy computational demand of MI, a simple method is presented based on fuzzy clustering and Bayes’ rule. After a preliminary conditioning phase, the data are grouped by fuzzy clustering and approximated with the value of the most relevant centroid. Then the prior and likelihood probabilities are computed by frequentist methods by counting the occurrences of each sample with respect to the precomputed clusters. In this way the MI can be quickly computed, to yield the relative importance of the informative content of each input.