Abstract
Automatic machine learning (AutoML) systems have been shown to perform better when they use metamodels trained offline. Existing offline metalearning approaches treat ML models as black boxes. However, modern ML models often compose multiple ML algorithms into ML pipelines. We expand previous metalearning work on estimating the performance and ranking of ML models by exploiting the metadata about which ML algorithms are used in a given pipeline. We propose a dynamically assembled neural network with the potential to model arbitrary DAG structures. We compare our proposed metamodel against reasonable baselines that exploit varying amounts of pipeline metadata, including metamodels used in existing AutoML systems. We observe that metamodels that fully exploit pipeline metadata are better estimators of pipeline performance. We also find that ranking pipelines based on dataset metafeature similarity outperforms ranking based on performance estimates.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Schoenfeld, Brandon J., "Metalearning by Exploiting Granular Machine Learning Pipeline Metadata" (2020). Theses and Dissertations. 8730.
https://scholarsarchive.byu.edu/etd/8730
Date Submitted
2020-12-08
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd11474
Keywords
metalearning, automl, ml pipeline, metamodel, meta-dataset, metafeatures
Language
english