Keywords
crop model; machine learning; crop yield; domain knowledge; simulation
Start Date
16-9-2020 3:00 PM
End Date
16-9-2020 3:20 PM
Abstract
Surrogate models using machine learning are common in environmental sciences but less used in agricultural models. In this paper we explore how we can develop a machine learning surrogate for crop growth models. We performed a case study using a TIPSTAR potato crop growth simulation model. Simulations were performed for 22 years of Dutch weather data, 4 weather stations, 16 common Dutch soil types and 4 sowing dates, totalling 5568 distinct simulations. Our aim was to predict the maximum seasonal tuber weight 60 days after sowing, only using input data until that moment. We aggregated the simulated data and extracted features. The extracted features were counts and averages of the weather variables in intervals of 7,15,30 and 60 days. A feature selection procedure followed to select 14 features. Then, we applied regularized linear regression (RLR) and random forest regression (RFR) to estimate the tuber weight. The RLR had a Mean Absolute Percentage Error (MAPE) of 11.7 on the training set and 11.8 on the test set. The R2 was 0.27 and 0.26 respectively. RFR had a MAPE of 5.6 on the training set and 6.5 on the test set. The R2 was 0.81 and 0.74 respectively.
Machine learning surrogates for crop models
Surrogate models using machine learning are common in environmental sciences but less used in agricultural models. In this paper we explore how we can develop a machine learning surrogate for crop growth models. We performed a case study using a TIPSTAR potato crop growth simulation model. Simulations were performed for 22 years of Dutch weather data, 4 weather stations, 16 common Dutch soil types and 4 sowing dates, totalling 5568 distinct simulations. Our aim was to predict the maximum seasonal tuber weight 60 days after sowing, only using input data until that moment. We aggregated the simulated data and extracted features. The extracted features were counts and averages of the weather variables in intervals of 7,15,30 and 60 days. A feature selection procedure followed to select 14 features. Then, we applied regularized linear regression (RLR) and random forest regression (RFR) to estimate the tuber weight. The RLR had a Mean Absolute Percentage Error (MAPE) of 11.7 on the training set and 11.8 on the test set. The R2 was 0.27 and 0.26 respectively. RFR had a MAPE of 5.6 on the training set and 6.5 on the test set. The R2 was 0.81 and 0.74 respectively.
Stream and Session
false