Keywords

Data grids, reproducible research, federation

Location

Session A1: Leveraging Cyberinfrastructure to Advance Scientific Productivity and Reproducibility in the Water Sciences

Start Date

16-6-2014 10:40 AM

End Date

16-6-2014 12:00 PM

Abstract

The DataNet Federation Consortium (DFC) is an NSF funded project that provides cyberinfrastructure for federating data management systems into a collaboration environment. Researchers are able to build a shared collection, apply analysis workflows, and manage the analysis results. The shared collections may span storage resources at multiple institutions, and multiple types of data management systems. The analysis workflows may include staging of files to remote compute platforms, and in-place analysis at the remote data storage location. The workflows can be captured as shareable objects, with automatic capture of input files, input parameters, and output files. A key feature is support for interoperability mechanisms for accessing data from remote repositories, using an appropriate external protocol. It is possible to automate acquisition of data from external resources, transform the data sets into required formats, and save results within the collaboration environment. The interoperability mechanisms can be controlled by policies that automate data acquisition steps, automate validation of collection properties such as integrity, and automate execution of administrative tasks such as data migration. Researchers can share a workflow, modify input parameters, re-execute the workflow, and share the output results. This enables reproducible data-driven research. The DFC builds upon the integrated Rule Oriented Data System, which is open source software middleware available from http://irods.org. The DFC web site is http://datafed.org.

COinS
 
Jun 16th, 10:40 AM Jun 16th, 12:00 PM

Reproducible Research within the DataNet Federation Consortium

Session A1: Leveraging Cyberinfrastructure to Advance Scientific Productivity and Reproducibility in the Water Sciences

The DataNet Federation Consortium (DFC) is an NSF funded project that provides cyberinfrastructure for federating data management systems into a collaboration environment. Researchers are able to build a shared collection, apply analysis workflows, and manage the analysis results. The shared collections may span storage resources at multiple institutions, and multiple types of data management systems. The analysis workflows may include staging of files to remote compute platforms, and in-place analysis at the remote data storage location. The workflows can be captured as shareable objects, with automatic capture of input files, input parameters, and output files. A key feature is support for interoperability mechanisms for accessing data from remote repositories, using an appropriate external protocol. It is possible to automate acquisition of data from external resources, transform the data sets into required formats, and save results within the collaboration environment. The interoperability mechanisms can be controlled by policies that automate data acquisition steps, automate validation of collection properties such as integrity, and automate execution of administrative tasks such as data migration. Researchers can share a workflow, modify input parameters, re-execute the workflow, and share the output results. This enables reproducible data-driven research. The DFC builds upon the integrated Rule Oriented Data System, which is open source software middleware available from http://irods.org. The DFC web site is http://datafed.org.