Abstract
Time delays are an inherent part of real-world systems. Besides the apparent slowing of the system, these time delays often cause destabilization in otherwise stable systems, and perhaps even more unexpectedly, can stabilize an unstable system. Here, we propose the Stochastic Time-Delayed Adaptation as a method for improving optimization on certain high-dimensional surfaces, which simply wraps a known optimizer --such as the Adam optimizer-- and is able to add a variety of time-delays. We begin by exploring time delays on certain gradient-based optimization methods and their affect on the optimizer's convergence properties. These optimizers include the standard gradient descent method and the more recent Adam Optimizer, where the latter is commonly used in neural networks for deep learning. To begin to describe the effect of time-delays on these methods, we use the theory of intrinsic stability. It has been shown that a system that possesses the property of intrinsic stability (a stronger form of global stability) will maintain its stability when subject to any time delays, e.g., constant, periodic, stochastic, etc. In feasible cases, we find relevant conditions under which the optimization method adapted with time delays is intrinsically stable and therefore converges to the system's minimal value. Finally, we examine the optimizer's performance using common optimizer performance metrics. This includes the number of steps an algorithm takes to converge and also the final loss value in relation to the global minimum of the loss function. We test these outcomes using various adaptations of the Adam optimizer on multiple common test optimization functions, which are designed to be difficult for vanilla optimizer methods. We show that the Stochastic Time-Delayed Adaptation can greatly improve an optimizer's ability to find a global minimum of a complex loss function.
Degree
MS
College and Department
Physical and Mathematical Sciences; Mathematics
Rights
https://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Manner, Eric Benson, "Introducing Stochastic Time Delays in Gradient Optimization as a Method for Complex Loss Surface Navigation in High-Dimensional Settings" (2023). Theses and Dissertations. 10370.
https://scholarsarchive.byu.edu/etd/10370
Date Submitted
2023-04-24
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd13208
Keywords
optimization, time delays, adam optimizer, gradient descent, intrinsic stability
Language
english