Neural networks can perform an incredible array of complex tasks, but successfully training a network is difficult because it requires us to minimize a function about which we know very little. In practice, developing a good model requires both intuition and a lot of guess-and-check. In this dissertation, we study a type of fully-connected neural network that improves on standard rectifier networks while retaining their useful properties. We then examine this type of network and its loss function from a probabilistic perspective. This analysis leads to a new rule for parameter initialization and a new method for predicting effective learning rates for gradient descent. Experiments confirm that the theory behind these developments translates well into practice.
College and Department
Physical and Mathematical Sciences; Mathematics
BYU ScholarsArchive Citation
Hettinger, Christopher James, "Hyperparameters for Dense Neural Networks" (2019). Theses and Dissertations. 7531.
neural networks, backpropagation, gradient descent, crelu