Keywords

Big Data, Data Analytics, Features Selection

Start Date

25-6-2018 10:40 AM

End Date

25-6-2018 12:00 PM

Abstract

The effects of climate change have been observed for decades now that we can access to multiple methods of Earth Observation (EO) using in situ, air-borne and space-borne sensing. The generated EO Big Data from these sources is of paramount importance for scientists to understand the effects of climate change and the specific engendered natural (and anthropogenic) processes that are likely to trigger the changing behaviour of species on Earth. In the EO4wildlife project (http://www.copernicus.eu/projects/eo4wildlife), we have access to Copernicus and Argos EO Big Data for investigating the changes of habitats for a variety of marine species. The challenge is to forecast the habitats by identifying the causal relationships between animal presence and Metocean environmental fronts. This is achieved by processing data of animal presence, which are relatively small in size and sparse, and their correlation with environmental datasets, which are large and dense in feature space. This poses big data challenges in terms of optimisation of resources, mining and feature selections. Once overcome, it improves the performance of the forecasting models. The availability of big geospatial information, satellite data and in situ observations enabled us experiment on the scalability of our distributed data storage technologies and analytics services in the cloud. We specifically deployed cluster infrastructure via Spark for a resilient distribution of processing over multiple nodes. The testbed experiments of our big data processing performance is validated under three types of selected habitat forecasting workflows. These will be described in details in the completed version of this paper.

Stream and Session

Stream A: Advanced Methods and Approaches in Environmental Computing

A1: Towards More Interoperable, Reusable and Scalable Environmental Software


COinS
 
Jun 25th, 10:40 AM Jun 25th, 12:00 PM

Scalable Big Data Platform, Mining and Analytics Services for Optimized Forecast of Animals Habitats

The effects of climate change have been observed for decades now that we can access to multiple methods of Earth Observation (EO) using in situ, air-borne and space-borne sensing. The generated EO Big Data from these sources is of paramount importance for scientists to understand the effects of climate change and the specific engendered natural (and anthropogenic) processes that are likely to trigger the changing behaviour of species on Earth. In the EO4wildlife project (http://www.copernicus.eu/projects/eo4wildlife), we have access to Copernicus and Argos EO Big Data for investigating the changes of habitats for a variety of marine species. The challenge is to forecast the habitats by identifying the causal relationships between animal presence and Metocean environmental fronts. This is achieved by processing data of animal presence, which are relatively small in size and sparse, and their correlation with environmental datasets, which are large and dense in feature space. This poses big data challenges in terms of optimisation of resources, mining and feature selections. Once overcome, it improves the performance of the forecasting models. The availability of big geospatial information, satellite data and in situ observations enabled us experiment on the scalability of our distributed data storage technologies and analytics services in the cloud. We specifically deployed cluster infrastructure via Spark for a resilient distribution of processing over multiple nodes. The testbed experiments of our big data processing performance is validated under three types of selected habitat forecasting workflows. These will be described in details in the completed version of this paper.