Paper/Poster/Presentation Title

Machine Learning Baseline for Crop Yield Prediction

Keywords

Crop Yield Prediction, Machine Learning, Modularity, Reusability, Large-scale Crop Yield

Start Date

17-9-2020 10:20 AM

End Date

17-9-2020 10:40 AM

Abstract

Many studies have applied machine learning to crop yield prediction, but they focus on obtaining optimal results for specific case studies. The required data may not be available for other places or the methods may not be transferable to other crops and locations. Large-scale crop yield forecasting systems, such as the MARS Crop Yield Forecasting System (MCYFS) of the European Commission’s Joint Research Centre, have historical data and other resources to build and assess crop yield prediction models for different crops and countries. However, these systems do not use machine learning and there is no baseline that shows how machine learning would work or perform with their data. We present a machine learning baseline for crop yield prediction using MCYFS data. The baseline is a workflow and methodological guide emphasizing correctness, modularity and reusability. The methodology focuses on designing explainable features and applying machine learning without information leakage. Features were designed using weather observations, crop growth model outputs and soil data. The workflow can be used to run repeatable experiments and obtain reproducible results with small configuration changes. The results can be a starting point for further optimizations. In our case studies, we predicted early season and end of season crop yield at regional level for ten crop and country combinations with and without using the yield trend from previous years. We compared the performance with a “null” method that either predicted a linear yield trend or the average of the training set. The predictions were also aggregated to the national level and compared with past MCYFS forecasts. The baseline can be improved by adding new data sources, designing more predictive features and evaluating different machine learning algorithms. The baseline will motivate the use of machine learning in large-scale crop yield forecasting.

Stream and Session

false

COinS
 
Sep 17th, 10:20 AM Sep 17th, 10:40 AM

Machine Learning Baseline for Crop Yield Prediction

Many studies have applied machine learning to crop yield prediction, but they focus on obtaining optimal results for specific case studies. The required data may not be available for other places or the methods may not be transferable to other crops and locations. Large-scale crop yield forecasting systems, such as the MARS Crop Yield Forecasting System (MCYFS) of the European Commission’s Joint Research Centre, have historical data and other resources to build and assess crop yield prediction models for different crops and countries. However, these systems do not use machine learning and there is no baseline that shows how machine learning would work or perform with their data. We present a machine learning baseline for crop yield prediction using MCYFS data. The baseline is a workflow and methodological guide emphasizing correctness, modularity and reusability. The methodology focuses on designing explainable features and applying machine learning without information leakage. Features were designed using weather observations, crop growth model outputs and soil data. The workflow can be used to run repeatable experiments and obtain reproducible results with small configuration changes. The results can be a starting point for further optimizations. In our case studies, we predicted early season and end of season crop yield at regional level for ten crop and country combinations with and without using the yield trend from previous years. We compared the performance with a “null” method that either predicted a linear yield trend or the average of the training set. The predictions were also aggregated to the national level and compared with past MCYFS forecasts. The baseline can be improved by adding new data sources, designing more predictive features and evaluating different machine learning algorithms. The baseline will motivate the use of machine learning in large-scale crop yield forecasting.