Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines

Stream A: Integrated Environmental Modelling: a multidisciplinary approach for innovative applications

Paper/Poster/Presentation Title

Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines

Presenter/Author Information

Bakinam T. Essawy, University of VirginiaFollow
Jonathan L. Goodall, University of VirginiaFollow
Tanu Malik, University of ChicagoFollow
Hao Xu, University of North CarolinaFollow
Michael Conway, University of North CarolinaFollow
Yolanda Gil, University of Southern CaliforniaFollow

Keywords

Reproducibility; legacy software; hydrologic modeling; Docker containers; metadata

Location

Session A2: Interoperability, Reusability, and Integrated Systems

Start Date

11-7-2016 9:50 AM

End Date

11-7-2016 10:10 AM

Abstract

In hydrology, like many other scientific disciplines with large computational demands, scientists have created a significant and growing collection of software tools for data manipulation, analysis, and simulation. While core computation model software are likely to be well maintained by the groups that develop these codes, other software such as data pre- and post-processing tools, used less often but still critical to scientists, may receive less attention. These codes will become “legacy” software, simply meaning that the software is out of date by modern standards. A challenge facing the scientific community is how to maintain this legacy software so that it achieves reproducible results now and in the future, with minimal investment of resources. This talk will present an example of this problem in hydrology with the pre-processing tools used to create a Variable Infiltration Capacity (VIC) model simulation. The data processing pipeline for creating the input files for VIC is complex requiring code written over the years by various student researchers and sometimes requiring out-of-date compilers (e.g., FORTRAN 77) to compile portions of the code. We are confident that the use of legacy software is not a unique problem for VIC, but rather a wider problem common with other hydrologic models and scientific modeling in general. Through prior work, we have automated a VIC data processing pipeline, but moving these pipelines to new machines remains a significant challenge due in large part to the need to install legacy software dependencies. This work takes the following steps to address these challenges. The first step is to create containers using Docker to more easily execute legacy software across machines. This is done using the NSF funded projects GeoDataspace and Data Net Federation (DFC) to create and execute the Docker container as a Web application. The second step is to capture metadata for the large number of processing tools within the VIC data processing pipeline so that provenance of the software can be more easily tracked in the future. This is done using metadata frameworks created through the NSF funded HydroShare and OntoSoft projects. This methodology could serve as a general approach for making data processing pipelines more transparent and reproducible.

Download

Included in

Civil Engineering Commons, Data Storage Systems Commons, Environmental Engineering Commons, Hydraulic Engineering Commons, Other Civil and Environmental Engineering Commons

COinS

Jul 11th, 9:50 AM Jul 11th, 10:10 AM

Challenges with Maintaining Legacy Software to Achieve Reproducible Computational Analyses: An Example for Hydrologic Modeling Data Processing Pipelines

Session A2: Interoperability, Reusability, and Integrated Systems

Stream A: Integrated Environmental Modelling: a multidisciplinary approach for innovative applications

Paper/Poster/Presentation Title

Presenter/Author Information

Keywords

Location

Start Date

End Date

Abstract

Included in

Conference Links

Search

BYU

BYU Links

Links