Paper/Poster/Presentation Title

Machine learning surrogates for crop models

Keywords

crop model; machine learning; crop yield; domain knowledge; simulation

Start Date

16-9-2020 3:00 PM

End Date

16-9-2020 3:20 PM

Abstract

Surrogate models using machine learning are common in environmental sciences but less used in agricultural models. In this paper we explore how we can develop a machine learning surrogate for crop growth models. We performed a case study using a TIPSTAR potato crop growth simulation model. Simulations were performed for 22 years of Dutch weather data, 4 weather stations, 16 common Dutch soil types and 4 sowing dates, totalling 5568 distinct simulations. Our aim was to predict the maximum seasonal tuber weight 60 days after sowing, only using input data until that moment. We aggregated the simulated data and extracted features. The extracted features were counts and averages of the weather variables in intervals of 7,15,30 and 60 days. A feature selection procedure followed to select 14 features. Then, we applied regularized linear regression (RLR) and random forest regression (RFR) to estimate the tuber weight. The RLR had a Mean Absolute Percentage Error (MAPE) of 11.7 on the training set and 11.8 on the test set. The R2 was 0.27 and 0.26 respectively. RFR had a MAPE of 5.6 on the training set and 6.5 on the test set. The R2 was 0.81 and 0.74 respectively.

Stream and Session

false

COinS
 
Sep 16th, 3:00 PM Sep 16th, 3:20 PM

Machine learning surrogates for crop models

Surrogate models using machine learning are common in environmental sciences but less used in agricultural models. In this paper we explore how we can develop a machine learning surrogate for crop growth models. We performed a case study using a TIPSTAR potato crop growth simulation model. Simulations were performed for 22 years of Dutch weather data, 4 weather stations, 16 common Dutch soil types and 4 sowing dates, totalling 5568 distinct simulations. Our aim was to predict the maximum seasonal tuber weight 60 days after sowing, only using input data until that moment. We aggregated the simulated data and extracted features. The extracted features were counts and averages of the weather variables in intervals of 7,15,30 and 60 days. A feature selection procedure followed to select 14 features. Then, we applied regularized linear regression (RLR) and random forest regression (RFR) to estimate the tuber weight. The RLR had a Mean Absolute Percentage Error (MAPE) of 11.7 on the training set and 11.8 on the test set. The R2 was 0.27 and 0.26 respectively. RFR had a MAPE of 5.6 on the training set and 6.5 on the test set. The R2 was 0.81 and 0.74 respectively.