Abstract
Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they contain. We submit that in situations where some variability exists in the clarity or 'Confidence' associated with individual data points – Notably environmental data – an approach that factors this confidence into account during the training phase is beneficial. We propose a methodological framework for assigning confidence to individual data records and augmenting training with that information. We then exercise this methodology on two separate datasets: A simulated data set, and a real-world, Environmental Science data set with a focus on streamflow diel signals. The simulated data set provides integral understanding of the nature of the data involved, and the Environmental Science data set provides a real-world case study of an application of this methodology against noisy data. Both studies' results indicate that applying and utilizing confidence in training increases performance and assists in the Data Mining Process.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Gustafson, Nathaniel Lee, "A Confidence-Prioritization Approach to Data Processing in Noisy Data Sets and Resulting Estimation Models for Predicting Streamflow Diel Signals in the Pacific Northwest" (2012). Theses and Dissertations. 3294.
https://scholarsarchive.byu.edu/etd/3294
Date Submitted
2012-08-09
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd5610
Keywords
machine learning, data mining, data, data processing, pre-processing, confidence, prioritization, environmental science, hydrology, diel, diel fluctuation, diel signal, streamflow, hydrogeology, watershed
Language
English