Keywords
Data mining; Uncertainty reduction; Environmental data processing; Fuzzy systems; Mutual Information; Bayesian inference.
Location
Session C1: VI Data Mining for Environmental Sciences Session
Start Date
13-7-2016 8:30 AM
End Date
13-7-2016 8:50 AM
Abstract
When dealing with complex environmental datasets, it is also difficult to establish the strength of the input-output relation among variables. Correlation analysis may yield a preliminary indication, but is limited to the linear case. Mutual Information (MI) is a more powerful method which can establish input-output dependence regardless of the nature of their interaction. However, to avoid the heavy computational demand of MI, a simple method is presented based on fuzzy clustering and Bayes’ rule. After a preliminary conditioning phase, the data are grouped by fuzzy clustering and approximated with the value of the most relevant centroid. Then the prior and likelihood probabilities are computed by frequentist methods by counting the occurrences of each sample with respect to the precomputed clusters. In this way the MI can be quickly computed, to yield the relative importance of the informative content of each input.
Included in
Civil Engineering Commons, Data Storage Systems Commons, Environmental Engineering Commons, Hydraulic Engineering Commons, Other Civil and Environmental Engineering Commons
ASSESSING INPUT-OUTPUT RELATIONS IN ENVIRONMENTAL DATA BY MEANS OF FUZZY CLUSTERING AND BAYESIAN INFERENCE
Session C1: VI Data Mining for Environmental Sciences Session
When dealing with complex environmental datasets, it is also difficult to establish the strength of the input-output relation among variables. Correlation analysis may yield a preliminary indication, but is limited to the linear case. Mutual Information (MI) is a more powerful method which can establish input-output dependence regardless of the nature of their interaction. However, to avoid the heavy computational demand of MI, a simple method is presented based on fuzzy clustering and Bayes’ rule. After a preliminary conditioning phase, the data are grouped by fuzzy clustering and approximated with the value of the most relevant centroid. Then the prior and likelihood probabilities are computed by frequentist methods by counting the occurrences of each sample with respect to the precomputed clusters. In this way the MI can be quickly computed, to yield the relative importance of the informative content of each input.