Stream C

Crop Modelling at Scale using Cloud and HPC Infrastructures

Rob Knapen, Wageningen University and Research,Follow
Allard de Wit, Wageningen University and Research,
Eliya Buyukkaya, Wageningen University and Research,
Petros Petrou, UBITech

Keywords

WOFOST; Apache Spark; Kubernetes; High Performance Computing

Start Date

5-7-2022 3:00 PM

End Date

5-7-2022 3:20 PM

Abstract

The well-known World Food Studies simulation model (WOFOST) is a process model for the quantitative analysis of the growth and production of annual field crops. Over time this model has been implemented by quite a few organizations, using a diverse range of programming languages such as R (Rwofost), Python (The Python Crop Simulation Environment (PCSE)), C# and the original version in Fortran. Most frequently the model is being deployed on single computers or laptops, which poses limits to the number of simulations that can be performed within a reasonable timeframe. A recent development is the Java implementation of WOFOST, based on a small modelling framework called Wageningen Integrated Systems Simulator (WISS). One of the advantages of this implementation is the interoperability with current off-the-shelve available Big Data processing tools, in particular Apache Spark. Spark, as a data engineering platform, provides a generalized programming model across various hardware configurations ranging from single computers to High Performance Clusters and Kubernetes Cloud installations. In the Cybele EU Horizon 2020 research project we studied possible approaches for such a WOFOST - Spark solution. We successfully implemented a prototype and measured performance and scalability using two different hardware configurations. The results demonstrated that adequate overall execution time can be achieved, and that it scaled sufficiently when the number of computers used was increased. For example, the prototype ran 5 million crop simulations in 821 minutes on a single computer and in 88 minutes on 16 computers, using a MongoDB database for storage. Noteworthy other findings include that a traditional fully normalized relational database can be a performance bottleneck for distributed processing, and that the large amounts of data produced require matching data analysis tools and careful thinking of what to store and what not.

Stream and Session

false

Download

COinS

Jul 5th, 3:00 PM Jul 5th, 3:20 PM

Crop Modelling at Scale using Cloud and HPC Infrastructures

Stream C

Crop Modelling at Scale using Cloud and HPC Infrastructures

Keywords

Start Date

End Date

Abstract

Stream and Session

Conference Links

Search

BYU

BYU Links

Links

Stream C

Crop Modelling at Scale using Cloud and HPC Infrastructures

Presenter/Author Information

Keywords

Start Date

End Date

Abstract

Stream and Session

Share

Conference Links

Search

BYU

BYU Links

Links