Abstract
Recent advances in self-supervised learning have made it possible to reuse information-rich models that have been generally pre-trained on massive amounts of data for other downstream tasks. But the pre-training process can be drastically different from the fine-tuning training process, which can lead to inefficient learning. We address this disconnect in training dynamics by structuring the learning process like an open system in thermodynamics. Open systems can achieve a steady state when low-entropy inputs are converted to high-entropy outputs. We modify the the model and the learning process to mimic this behavior, and attend more to elements of the input sequence that exhibit greater changes in entropy. We call this architecture the Open System Neural Network (OSNN). We show the efficacy of the OSNN on multiple classification datasets with a variety of encoder-only Transformers. We find that the OSNN outperforms nearly all model specific baselines, and achieves a new state-of-the-art result on two classification datasets.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Hatch, Bradley, "Open System Neural Networks" (2024). Theses and Dissertations. 10234.
https://scholarsarchive.byu.edu/etd/10234
Date Submitted
2024-01-12
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd13072
Keywords
deep learning, Transformers, neural networks, thermodynamics, open systems, training dynamics, entropy
Language
english