Keywords

WOFOST; Apache Spark; Kubernetes; High Performance Computing

Start Date

5-7-2022 3:00 PM

End Date

5-7-2022 3:20 PM

Abstract

The well-known World Food Studies simulation model (WOFOST) is a process model for the quantitative analysis of the growth and production of annual field crops. Over time this model has been implemented by quite a few organizations, using a diverse range of programming languages such as R (Rwofost), Python (The Python Crop Simulation Environment (PCSE)), C# and the original version in Fortran. Most frequently the model is being deployed on single computers or laptops, which poses limits to the number of simulations that can be performed within a reasonable timeframe. A recent development is the Java implementation of WOFOST, based on a small modelling framework called Wageningen Integrated Systems Simulator (WISS). One of the advantages of this implementation is the interoperability with current off-the-shelve available Big Data processing tools, in particular Apache Spark. Spark, as a data engineering platform, provides a generalized programming model across various hardware configurations ranging from single computers to High Performance Clusters and Kubernetes Cloud installations. In the Cybele EU Horizon 2020 research project we studied possible approaches for such a WOFOST - Spark solution. We successfully implemented a prototype and measured performance and scalability using two different hardware configurations. The results demonstrated that adequate overall execution time can be achieved, and that it scaled sufficiently when the number of computers used was increased. For example, the prototype ran 5 million crop simulations in 821 minutes on a single computer and in 88 minutes on 16 computers, using a MongoDB database for storage. Noteworthy other findings include that a traditional fully normalized relational database can be a performance bottleneck for distributed processing, and that the large amounts of data produced require matching data analysis tools and careful thinking of what to store and what not.

Stream and Session

false

COinS
 
Jul 5th, 3:00 PM Jul 5th, 3:20 PM

Crop Modelling at Scale using Cloud and HPC Infrastructures

The well-known World Food Studies simulation model (WOFOST) is a process model for the quantitative analysis of the growth and production of annual field crops. Over time this model has been implemented by quite a few organizations, using a diverse range of programming languages such as R (Rwofost), Python (The Python Crop Simulation Environment (PCSE)), C# and the original version in Fortran. Most frequently the model is being deployed on single computers or laptops, which poses limits to the number of simulations that can be performed within a reasonable timeframe. A recent development is the Java implementation of WOFOST, based on a small modelling framework called Wageningen Integrated Systems Simulator (WISS). One of the advantages of this implementation is the interoperability with current off-the-shelve available Big Data processing tools, in particular Apache Spark. Spark, as a data engineering platform, provides a generalized programming model across various hardware configurations ranging from single computers to High Performance Clusters and Kubernetes Cloud installations. In the Cybele EU Horizon 2020 research project we studied possible approaches for such a WOFOST - Spark solution. We successfully implemented a prototype and measured performance and scalability using two different hardware configurations. The results demonstrated that adequate overall execution time can be achieved, and that it scaled sufficiently when the number of computers used was increased. For example, the prototype ran 5 million crop simulations in 821 minutes on a single computer and in 88 minutes on 16 computers, using a MongoDB database for storage. Noteworthy other findings include that a traditional fully normalized relational database can be a performance bottleneck for distributed processing, and that the large amounts of data produced require matching data analysis tools and careful thinking of what to store and what not.