Keywords

Interactive CART, Machine Learning, Human-in-the-Loop, Data Mining

Start Date

17-9-2020 2:00 PM

End Date

17-9-2020 2:20 PM

Abstract

Many applications in environmental sciences require learning from data. Machine Learning (ML) methods are considered an efficient tool for that purpose due to their ability to identify patterns and structure in large datasets in an automatic way. Fully automatic ML techniques, like Classification and Regression Trees (CART), have been developed and used in numerous applications in environmental modelling. However, they require large amounts of data for training which are often lacking in environmental applications. Moreover, they lack transparency, as they don’t provide any explanation of how (or why) they ended up in a decision tree. Furthermore, they rely mainly on statistical methods and performance metrics (i.e. fit to data) to extract information and patterns from data. While performance is an important aspect in many applications, regarding environmental applications, performance may not necessarily be the only objective the modeller wants to achieve. Interpretability or explanatory power of the resulting decision trees are also important aspects of decision trees. On the other end, fully manual approaches for building a decision tree integrate expert domain knowledge, but they could be tedious and time consuming. In this project, we are trying to bridge this gap. We combine the strengths of the two approaches and integrate expert domain knowledge with ML strategies (CART) by putting the humans-in-the-loop. We propose an approach where users can interact with the automatic algorithm for tree building (e.g. by selecting branches to prune or changing features or thresholds in the nodes) either in an ex-post mode or at certain iterations of the algorithm. Thus, users can explore decision trees which may be less accurate but have more explanatory power. Our interactive approach may also prove useful in cases where data are scarce and/or noisy and the expert’s knowledge can guide the algorithm towards reaching a satisfying decision tree.

Stream and Session

false

COinS
 
Sep 17th, 2:00 PM Sep 17th, 2:20 PM

Integrating Human Knowledge and Data Mining through interactive Classification and Regression Trees (CART)

Many applications in environmental sciences require learning from data. Machine Learning (ML) methods are considered an efficient tool for that purpose due to their ability to identify patterns and structure in large datasets in an automatic way. Fully automatic ML techniques, like Classification and Regression Trees (CART), have been developed and used in numerous applications in environmental modelling. However, they require large amounts of data for training which are often lacking in environmental applications. Moreover, they lack transparency, as they don’t provide any explanation of how (or why) they ended up in a decision tree. Furthermore, they rely mainly on statistical methods and performance metrics (i.e. fit to data) to extract information and patterns from data. While performance is an important aspect in many applications, regarding environmental applications, performance may not necessarily be the only objective the modeller wants to achieve. Interpretability or explanatory power of the resulting decision trees are also important aspects of decision trees. On the other end, fully manual approaches for building a decision tree integrate expert domain knowledge, but they could be tedious and time consuming. In this project, we are trying to bridge this gap. We combine the strengths of the two approaches and integrate expert domain knowledge with ML strategies (CART) by putting the humans-in-the-loop. We propose an approach where users can interact with the automatic algorithm for tree building (e.g. by selecting branches to prune or changing features or thresholds in the nodes) either in an ex-post mode or at certain iterations of the algorithm. Thus, users can explore decision trees which may be less accurate but have more explanatory power. Our interactive approach may also prove useful in cases where data are scarce and/or noisy and the expert’s knowledge can guide the algorithm towards reaching a satisfying decision tree.