Keywords

approximate dynamic programming, reinforcement learning, stochastic optimization, machine learning, curse of dimensionality, multipurpose reservoir operations

Start Date

26-6-2018 3:40 PM

End Date

26-6-2018 5:20 PM

Abstract

Dynamic programming (DP) is considered the ideal optimization method for solving multipurpose reservoir system operational problems since it realistically addresses their complex nonlinear, dynamic, and stochastic characteristics. The only drawback to DP is the so-called “curse of dimensionality” that has plagued the method since its inception by Richard Bellman in the 1950’s. Dimensionality issues arise from the need to discretize the state-action space and random variates which leads to an explosion in computational and memory requirements with increased state-space dimensionality. DP also requires development of spatial-temporal stochastic hydrologic models for reservoir system operations, which may be difficult under complex climatic and meteorological conditions. A deep reinforcement learning algorithm is applied to solving DP problems for reservoir system operations which effectively overcomes dimensionality issues without requiring any model simplifications, or sacrificing any of the unique advantages of DP. The algorithm uses an iterative learning process which considers delayed rewards without requiring an explicit probabilistic model of the hydrologic processes. The algorithm is executed in a model-free stochastic environment whereby the algorithm implicitly learns the underlying stochastic behavior of the system for developing dynamic, optimal feedback operating policies. Dimensionality issues are addressed through use of accurate function approximators for the state-value and policy functions based on deep neural networks. The deep reinforcement learning algorithm is applied to developing optimal reservoir operational strategies in the Upper Russian River basin of Northern California in the presence of multiple noncommensurate objectives, including flood control, domestic and agricultural water supply, and environmental flow requirements.

Stream and Session

A3: Simulation, Optimization, and Metamodelling: Tradeoffs of Speed, Resource Utilization, and Accuracy

Download

COinS

Jun 26th, 3:40 PM Jun 26th, 5:20 PM

Deep Reinforcement Learning for Optimal Operation of Multipurpose Reservoir Systems

Stream A: Advanced Methods and Approaches in Environmental Computing

Deep Reinforcement Learning for Optimal Operation of Multipurpose Reservoir Systems

Keywords

Start Date

End Date

Abstract

Stream and Session

A3: Simulation, Optimization, and Metamodelling: Tradeoffs of Speed, Resource Utilization, and Accuracy

Conference Links

Search

BYU

BYU Links

Links

Stream A: Advanced Methods and Approaches in Environmental Computing

Deep Reinforcement Learning for Optimal Operation of Multipurpose Reservoir Systems

Presenter/Author Information

Keywords

Start Date

End Date

Abstract

Stream and Session

A3: Simulation, Optimization, and Metamodelling: Tradeoffs of Speed, Resource Utilization, and Accuracy

Share

Conference Links

Search

BYU

BYU Links

Links