Keywords

Virtual labs; environmental informatics; DataLabs; open science; eScience; virtual research environments

Start Date

5-7-2022 12:00 PM

End Date

8-7-2022 9:59 AM

Abstract

Research in Environmental Data Science is trans-disciplinary, where scientists and practitioners from different disciplines create data-driven solutions to environmental challenges, often using a large amount of data along with analytical methods. DataLabs, in continuous development by UKCEH using the agile approach, is a cloud-based virtual research platform that advocates open science by providing the infrastructure and software tools for collaborating scientists to explore big data, develop and share new methods, and communicate their results to stakeholders and decision-makers. The architecture of DataLabs follows the principles of service-oriented architectures, to enable selecting the most appropriate technology for each component and exposing any functions by other systems via HTTP as services, while each component uses the modular architecture to ensure separation of concerns and separated presentation. DataLabs is hosted on JASMIN, a UK-based data analysis facility for data-intensive environmental science, giving seamless access to HPC and data storage resources. DataLabs main components, including analytical tools, analytics execution engines, narrative computing tools, publishing tools, distributed computing services, and data management, are coherently integrated incorporating existing technologies. The analytic tools include Jupyter notebooks for data analysis, RStudio capabilities in a browser, and Zeppelin notebooks for working with big data. Publishing tools include NBViewer (publishing tool for Jupyter Notebooks), RShiny (for building interactive web applications from R code), Nginx (for publishing static web applications). The distributed computing services include Dask, a flexible parallel computing library for analytic computing, and Spark, a multi-language engine for scalable execution of data analytics. Within the Data Science of the Natural Environment (DSNE) project, a transdisciplinary team of environmental scientists, statisticians, computer scientists and social scientists are collaborating using DataLabs to develop a broad range of statistical/data science techniques fused with environmental models to enable better-informed decision-making around environmental grand challenges. DataLabs allows easy integration of data, models, and methods, as well as rapid incorporation of novel data science methods to be explored in different environmental problems.

Stream and Session

false

COinS
 
Jul 5th, 12:00 PM Jul 8th, 9:59 AM

Design and Development of DataLabs for Environment Data Science

Research in Environmental Data Science is trans-disciplinary, where scientists and practitioners from different disciplines create data-driven solutions to environmental challenges, often using a large amount of data along with analytical methods. DataLabs, in continuous development by UKCEH using the agile approach, is a cloud-based virtual research platform that advocates open science by providing the infrastructure and software tools for collaborating scientists to explore big data, develop and share new methods, and communicate their results to stakeholders and decision-makers. The architecture of DataLabs follows the principles of service-oriented architectures, to enable selecting the most appropriate technology for each component and exposing any functions by other systems via HTTP as services, while each component uses the modular architecture to ensure separation of concerns and separated presentation. DataLabs is hosted on JASMIN, a UK-based data analysis facility for data-intensive environmental science, giving seamless access to HPC and data storage resources. DataLabs main components, including analytical tools, analytics execution engines, narrative computing tools, publishing tools, distributed computing services, and data management, are coherently integrated incorporating existing technologies. The analytic tools include Jupyter notebooks for data analysis, RStudio capabilities in a browser, and Zeppelin notebooks for working with big data. Publishing tools include NBViewer (publishing tool for Jupyter Notebooks), RShiny (for building interactive web applications from R code), Nginx (for publishing static web applications). The distributed computing services include Dask, a flexible parallel computing library for analytic computing, and Spark, a multi-language engine for scalable execution of data analytics. Within the Data Science of the Natural Environment (DSNE) project, a transdisciplinary team of environmental scientists, statisticians, computer scientists and social scientists are collaborating using DataLabs to develop a broad range of statistical/data science techniques fused with environmental models to enable better-informed decision-making around environmental grand challenges. DataLabs allows easy integration of data, models, and methods, as well as rapid incorporation of novel data science methods to be explored in different environmental problems.