Abstract

Dataset distillation is a nascent technique that promises accelerated neural network training. In particular, distillation produces a highly compressed dataset that can train a neural network in as little as a single optimization step. However, distillation has not been adopted for real-world use cases due to weaknesses, such as poor approximation of the original task, unclear interpretability, high costs of the initial distillation, and poorly elaborated use cases. Here, we present five papers that advance task distillation towards real-world adoption. The papers focus on: distilling reinforcement learning environments, interpreting distillation-produced datasets, improving distillation performance using ensembling, examining distillation's generalization to novel architectures, and using distillation alongside other techniques for neural architecture search for reinforcement learning. In the first, we expand distillation to compress reinforcement learning environments into synthetic datasets. We demonstrate this technique by distilling Atari and MuJoCo environments into single-batch synthetic datasets, and report that training on the synthetic dataset approximates training on the original environment, yielding similar performance in several environments. In the second, we examine the interpretability of single-batch distilled datasets by comparing the loss landscapes created by distillation versus standard training. We demonstrate that distillation often works by mimicking the position of minima on the original dataset's cost surface. In the third, we examine ensembling distillation-trained models. We demonstrate significant performance improvement of the ensembles versus the best-performing constituent models, and provide evidence that the distilled dataset is providing a reasonable approximation of the posterior distribution on the original dataset. In the fourth, we examine how distilled datasets generalize to various architectures. We define metrics to capture aspects of generalization, propose regularization methods integrated with distillation production algorithms, and demonstrate that the regularized distillations display improved scores on the proposed metrics. In the fifth, we demonstrate neural architecture search on reinforcement learning environments using online RL, offline RL, and distillation. We demonstrate that architecture searches using any of the three training methods can yield architectures that significantly outperform the baseline of the architecture space.

Degree

PhD

College and Department

Computational, Mathematical, and Physical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/