Abstract

Recent advances in self-supervised learning have made it possible to reuse information-rich models that have been generally pre-trained on massive amounts of data for other downstream tasks. But the pre-training process can be drastically different from the fine-tuning training process, which can lead to inefficient learning. We address this disconnect in training dynamics by structuring the learning process like an open system in thermodynamics. Open systems can achieve a steady state when low-entropy inputs are converted to high-entropy outputs. We modify the the model and the learning process to mimic this behavior, and attend more to elements of the input sequence that exhibit greater changes in entropy. We call this architecture the Open System Neural Network (OSNN). We show the efficacy of the OSNN on multiple classification datasets with a variety of encoder-only Transformers. We find that the OSNN outperforms nearly all model specific baselines, and achieves a new state-of-the-art result on two classification datasets.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2024-01-12

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd13072

Keywords

deep learning, Transformers, neural networks, thermodynamics, open systems, training dynamics, entropy

Language

english

Share

COinS