Keywords
approximate dynamic programming, reinforcement learning, stochastic optimization, machine learning, curse of dimensionality, multipurpose reservoir operations
Start Date
26-6-2018 3:40 PM
End Date
26-6-2018 5:20 PM
Abstract
Dynamic programming (DP) is considered the ideal optimization method for solving multipurpose reservoir system operational problems since it realistically addresses their complex nonlinear, dynamic, and stochastic characteristics. The only drawback to DP is the so-called “curse of dimensionality” that has plagued the method since its inception by Richard Bellman in the 1950’s. Dimensionality issues arise from the need to discretize the state-action space and random variates which leads to an explosion in computational and memory requirements with increased state-space dimensionality. DP also requires development of spatial-temporal stochastic hydrologic models for reservoir system operations, which may be difficult under complex climatic and meteorological conditions. A deep reinforcement learning algorithm is applied to solving DP problems for reservoir system operations which effectively overcomes dimensionality issues without requiring any model simplifications, or sacrificing any of the unique advantages of DP. The algorithm uses an iterative learning process which considers delayed rewards without requiring an explicit probabilistic model of the hydrologic processes. The algorithm is executed in a model-free stochastic environment whereby the algorithm implicitly learns the underlying stochastic behavior of the system for developing dynamic, optimal feedback operating policies. Dimensionality issues are addressed through use of accurate function approximators for the state-value and policy functions based on deep neural networks. The deep reinforcement learning algorithm is applied to developing optimal reservoir operational strategies in the Upper Russian River basin of Northern California in the presence of multiple noncommensurate objectives, including flood control, domestic and agricultural water supply, and environmental flow requirements.
Deep Reinforcement Learning for Optimal Operation of Multipurpose Reservoir Systems
Dynamic programming (DP) is considered the ideal optimization method for solving multipurpose reservoir system operational problems since it realistically addresses their complex nonlinear, dynamic, and stochastic characteristics. The only drawback to DP is the so-called “curse of dimensionality” that has plagued the method since its inception by Richard Bellman in the 1950’s. Dimensionality issues arise from the need to discretize the state-action space and random variates which leads to an explosion in computational and memory requirements with increased state-space dimensionality. DP also requires development of spatial-temporal stochastic hydrologic models for reservoir system operations, which may be difficult under complex climatic and meteorological conditions. A deep reinforcement learning algorithm is applied to solving DP problems for reservoir system operations which effectively overcomes dimensionality issues without requiring any model simplifications, or sacrificing any of the unique advantages of DP. The algorithm uses an iterative learning process which considers delayed rewards without requiring an explicit probabilistic model of the hydrologic processes. The algorithm is executed in a model-free stochastic environment whereby the algorithm implicitly learns the underlying stochastic behavior of the system for developing dynamic, optimal feedback operating policies. Dimensionality issues are addressed through use of accurate function approximators for the state-value and policy functions based on deep neural networks. The deep reinforcement learning algorithm is applied to developing optimal reservoir operational strategies in the Upper Russian River basin of Northern California in the presence of multiple noncommensurate objectives, including flood control, domestic and agricultural water supply, and environmental flow requirements.
Stream and Session
A3: Simulation, Optimization, and Metamodelling: Tradeoffs of Speed, Resource Utilization, and Accuracy