Theses and Dissertations

Introducing Stochastic Time Delays in Gradient Optimization as a Method for Complex Loss Surface Navigation in High-Dimensional Settings

Eric Benson Manner, Brigham Young UniversityFollow

Abstract

Time delays are an inherent part of real-world systems. Besides the apparent slowing of the system, these time delays often cause destabilization in otherwise stable systems, and perhaps even more unexpectedly, can stabilize an unstable system. Here, we propose the Stochastic Time-Delayed Adaptation as a method for improving optimization on certain high-dimensional surfaces, which simply wraps a known optimizer --such as the Adam optimizer-- and is able to add a variety of time-delays. We begin by exploring time delays on certain gradient-based optimization methods and their affect on the optimizer's convergence properties. These optimizers include the standard gradient descent method and the more recent Adam Optimizer, where the latter is commonly used in neural networks for deep learning. To begin to describe the effect of time-delays on these methods, we use the theory of intrinsic stability. It has been shown that a system that possesses the property of intrinsic stability (a stronger form of global stability) will maintain its stability when subject to any time delays, e.g., constant, periodic, stochastic, etc. In feasible cases, we find relevant conditions under which the optimization method adapted with time delays is intrinsically stable and therefore converges to the system's minimal value. Finally, we examine the optimizer's performance using common optimizer performance metrics. This includes the number of steps an algorithm takes to converge and also the final loss value in relation to the global minimum of the loss function. We test these outcomes using various adaptations of the Adam optimizer on multiple common test optimization functions, which are designed to be difficult for vanilla optimizer methods. We show that the Stochastic Time-Delayed Adaptation can greatly improve an optimizer's ability to find a global minimum of a complex loss function.

Degree

College and Department

Physical and Mathematical Sciences; Mathematics

Rights

https://lib.byu.edu/about/copyright/

BYU ScholarsArchive Citation

Manner, Eric Benson, "Introducing Stochastic Time Delays in Gradient Optimization as a Method for Complex Loss Surface Navigation in High-Dimensional Settings" (2023). Theses and Dissertations. 10370.
https://scholarsarchive.byu.edu/etd/10370

Date Submitted

2023-04-24

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd13208

Keywords

optimization, time delays, adam optimizer, gradient descent, intrinsic stability

Language

english

Download

Included in

Physical Sciences and Mathematics Commons

COinS

BYU ScholarsArchive

Theses and Dissertations

Introducing Stochastic Time Delays in Gradient Optimization as a Method for Complex Loss Surface Navigation in High-Dimensional Settings

Abstract

Degree

College and Department

Rights

BYU ScholarsArchive Citation

Date Submitted

Document Type

Handle

Keywords

Language

Included in

Search

Browse

BYU Links

Author Corner

Hosted by the

BYU ScholarsArchive

Theses and Dissertations

Introducing Stochastic Time Delays in Gradient Optimization as a Method for Complex Loss Surface Navigation in High-Dimensional Settings

Author

Abstract

Degree

College and Department

Rights

BYU ScholarsArchive Citation

Date Submitted

Document Type

Handle

Keywords

Language

Included in

Share

Search

Browse

BYU Links

Author Corner

Hosted by the