Keywords

Dask; Distributed Computing; Water Quality Modeling; Computational Graphs

Start Date

5-7-2022 12:00 PM

End Date

8-7-2022 9:59 AM

Abstract

The Dask Python package provides an efficient way to design and execute distributed applications across a collection of computing resources. By constructing our water quality simulations, a collection of individual stream simulations within a stream network, as a Dask computational graph, we have developed the ability to execute temporally and spatially large water quality simulations in a distributed fashion. These simulations can be run locally, on a network, or in the cloud and can dynamically scale with the resources provided for the Dask cluster. Our water quality simulations are run at the NHDPlus V2.1 catchment spatial scale, using automated data provisioning, and accessed through a dynamic web application. The computational graph is constructed as a directed acyclic graph, corresponding to the stream network itself, where each segment is a node in the graph linked by a directed edge showing a data dependency. Data provisioning and simulation pre-processing functions are added as additional nodes connected, as necessary, to each of the stream segment simulation nodes. The function at each step is atomic and the results stored in a database, where the information is retrieved for the following cascading dependent tasks. With this design, we can avoid many of the common technical issues faced when executing large simulations.

Stream and Session

false

Share

COinS
 
Jul 5th, 12:00 PM Jul 8th, 9:59 AM

Water Quality Simulations with Dask

The Dask Python package provides an efficient way to design and execute distributed applications across a collection of computing resources. By constructing our water quality simulations, a collection of individual stream simulations within a stream network, as a Dask computational graph, we have developed the ability to execute temporally and spatially large water quality simulations in a distributed fashion. These simulations can be run locally, on a network, or in the cloud and can dynamically scale with the resources provided for the Dask cluster. Our water quality simulations are run at the NHDPlus V2.1 catchment spatial scale, using automated data provisioning, and accessed through a dynamic web application. The computational graph is constructed as a directed acyclic graph, corresponding to the stream network itself, where each segment is a node in the graph linked by a directed edge showing a data dependency. Data provisioning and simulation pre-processing functions are added as additional nodes connected, as necessary, to each of the stream segment simulation nodes. The function at each step is atomic and the results stored in a database, where the information is retrieved for the following cascading dependent tasks. With this design, we can avoid many of the common technical issues faced when executing large simulations.