Schedule

Subscribe to RSS Feed

2020
Tuesday, September 15th
9:20 AM

Approaching computational infrastructures from a distributed object-oriented angle

Michael Berg-Mohnicke, ZALF e.V., Germany

9:20 AM - 9:40 AM

In the field of agricultural and environmental research the ubiquitous availability of large quantities of computing power (e.g. HPC clusters) enables scientists to create more complex models and systems as well as apply those to increasingly large regions, countries or even continents. In this context new problems arise. The actual scientific application of models depends increasingly on custom tailored scripts and software created by dedicated computer science personal or similarly skilled scientists. Unfortunately, this situation does not reflect the available resources in many scientific institutions. We will present an approach we currently pursue at ZALF which tries to enable scientific users to handle a larger set of use cases themselves. Our goal is to create something one could call a capability secure distributed object-oriented infrastructure layer based on a set of Cap’n Proto schemas. Implementing these schemas enables us to describe the different stages of computational infrastructure use – from data and computational resource acquisition to model application and result processing to visualization. From a high-level view this layer will act like a distributed object-oriented program. The computer science staff will be able to incrementally extend it to fulfil new needs and due to its capability secure heritage allow secure sharing of resources (called capabilities) amongst its users. With the creation of visual interfaces on top of the infrastructure layer also less tech-savvy scientists will be able to solve their problems. Thus, our goal will be reached once we can put some of the available computational power and resources back into scientists’ hands.

9:40 AM

Integration First: Rethinking inter- and trans-disciplinary collaborations in modelling social-ecological systems

Gary Polhill, The James Hutton Institute, United Kingdom

9:40 AM - 10:00 AM

This paper accompanies the workshop on “New Tools or New Research Culture? Towards an Integration First approach to modelling social-environmental systems,” and details the outcomes of a small scale workshop at The James Hutton Institute on “Modular, Integrated Agent-Based Social-Ecological Modelling.” The latter workshop was predicated on the long-held desirability of modularity and reuse when building models that bring together data and knowledge from multiple disciplines. Much of the progress in this area has focused on developments in technology, with the state-of-the-art focusing on using semantic integration software to facilitate automation. However, a key finding from the workshop’s discussions was that no amount of technology can compensate for inappropriately designed and managed projects, or collaborators that are unwilling or (for institutional reasons) unable to step outside their home disciplines. Hence the concept of ‘Integration First’ – projects relying on data, knowledge and model integration to achieve the intended impacts need to be built around integration as the central priority, rather than something that can be done in the last six months to tie together various threads studied using established disciplinary methods. With an understanding of what Integration First might mean in terms of project design, management, implementation and modelling, the question of the software needed to support successful integration can then be more meaningfully asked.

12:40 PM

Thirty years of spatio-temporal modelling with PCRaster

Oliver Schmitz, Faculty of Geosciences, Netherlands, the Netherlands

12:40 PM - 1:00 PM

The open-source PCRaster environmental modelling platform (http://www.pcraster.eu) is a long-time developed toolbox for the construction of earth science simulation models, providing tailored model building blocks with main applications in hydrology, ecology, and environmental health. We present the recent developments of the PCRaster package. The distribution supports the conda package manager for a simplified installation on Linux, macOS and Windows, the usage of PCRaster in Jupyter Notebooks, the "multicore" module providing multithreaded field-based operations, significantly reduced build complexity, and a refactored code basis to recent C and C++ standards. While PCRaster originally started as a raster-based toolbox, it is nowadays required to enrich support for agent–based modelling due to the increasing scientific demand for constructing and analysing multi-disciplinary models, e.g. calculating personal human exposures to environmental variables by modelling human activity patterns and environmental variables. A major research focus is therefore on LUE, a new conceptual and physical data model capable to store fields and agents. The conceptual data model is a generalisation of field-based and agent-based data models and implemented as a physical data model using HDF5 and C++, and providing a Python API to expose functionality. Our LUE data model is part of a new modelling language, allowing for operations accepting both fields and agents as arguments, and therefore resembling and extending the map algebra approach to modelling. A major software development focus is on providing model building blocks usable in a high-performance computing context. We currently work on a framework with parallel and distributed model building blocks using the HPX C++ library, allowing to scale environmental algorithms from single nodes to entire compute clusters. In our presentation we give an overview of the 30 year development history of PCRaster, the major challenges in synchronising scientific and technical progress, and our approaches thereto.

3:40 PM

Early experience in ultra-scale E3SM land model development on SUMMIT

Dali Wang, Environmental Science Division, USA, Oak Ridge National Laboratory

3:40 PM - 4:00 PM

The Energy Exascale Earth System Model (E3SM) is a computationally advanced coupled climate-energy model investigating the challenges posed by the interactions of weather-climate scale variability with energy and related sectors. E3SM contains a community land model for understanding how natural and human changes in terrestrial land surfaces will affect the climate. E3SM Land Model (ELM) consists of submodels related to land biogeophysics, the hydrologic cycle, biogeochemistry, human activities, and ecosystem dynamics. In this paper, we present our early experience in redesigning ELM for a pre-exascale computer, SUMMIT, at Oak Ridge National Laboratory in the USA. Considering the complexity of the ELM software system and technical readiness of several cutting-edge computing technologies, we start our software engineering effort with single-site ELM simulations within a functional unit testing platform. This effort provides a good understanding of data structure refactoring, data movement, and code porting between heterogeneous hardware, such as GPU/CPU and disk/non-volatile memory. We investigate new OpenACC features to expedite the data movement and code porting on a single SUMMIT node. Then we explore new ways to generate synthesized forcing datasets to test parallel ultra-scale ELM simulation over North America. Our early experiments show that the new OpenACC features (i.e., deepcopy and the subroutine directive) from PGI Fortran are robust to create dedicated data regions containing complex data structures. Also, one single NVIDIA V100 GPU unit can comfortably handle up to 1900 site simulations. Therefore, we can use around 1500 SUMMIT nodes to undertake a continental-scale development of driving datasets and offline land simulations at an ultra-scale (1km x 1km) resolution over North America.

5:20 PM

Application Coupling Interfaces: A Novel Approach to the Semantic Interoperability Challenge in Integrated Systems Modeling

Kenneth "Mark" Bryden, Iowa State University

5:20 PM - 5:40 PM

Many real systems are modeled as networks, where the elements of the system are nodes, and the interactions between the elements are edges. A microservice architecture, in which each microservice is an element of the system, can be utilized to represent these network-based systems models. The challenge is ensuring that the data shared between microservices (i.e., models) accurately represents the interactions between the elements of the modeled system. That is, the interacting microservices must each attribute the same meaning to data being exchanged. This consistency of meaning can be achieved in two ways. Most commonly in systems modeling, there is an a priori fixed global ontology established by the modeling community in which consistency of meaning is maintained by a set of standards/agreements, i.e., data-model based semantic interoperability. While effective, data-model based semantic interoperability requires that new models comprehend and comply with the existing rule sets, limiting the extensibility of the system. Alternately, an ontology-based semantic interoperability approach can be taken. In which case, the needed shared vocabulary and meaning are dynamically derived. This approach overcomes the challenges of a fixed global ontology but poses its own challenges. A number of researchers have proposed that metadata can provide the link needed to derive meaning. However, the implementation of metadata to achieve derived meaning is a complex and multilayered process. In this paper we propose extending the application programming interface (API) paradigm to create application coupling interfaces (ACIs) that support the development of selfidentifying models based on a local ontology rather than global ontology. Within a prescribed environment described in this paper, these ACIs dynamically identify the local ontology and communications protocols and maintain consistency of meaning. This approach is demonstrated on a model of a village energy system representing the interactions between people, energy, and the environment in a west African village.

6:00 PM

Studying the effects of microtopography on surface flow across spatial scales with the SERGHEI model

Daniel Caviedes-Voullième, BTU Cottbus-Senftenberg, Germany

6:00 PM - 6:20 PM

Microtopography has been recognised to have complex impacts in runoff generation and catchment connectivity. When studying small scale flows, microtopography evidently dominates the water flow paths and the inundation of the surface. However, as the spatial scale of observations increases, the effect of microtopography is mixed and diffused, and is difficult to isolate in catchment hydrological signatures. Numerical simulations that study microtopography must explicitly resolve the scales of microtopographic features, which are orders of magnitude smaller than the typical domains of interest, such as hillslopes and catchments. The high computational cost that usually constrains such simulations can be approached by means of High-Performance Computing. In this work, we present the model framework SERGHEI (Simulator of Environment, Rainfall, Geomorphology, Hydraulics and Ecology), a high-performance parallelised model based on the Kokkos programming framework. The Kokkos framework enables SERGHEI to run efficiently on heterogeneous systems and multiple graphics processor units (GPU). Hence, SERGHEI is suitable to address very large computational studies. Using SERGHEI, we carry out several numerical experiments of rainfall-runoff in idealized catchments with different shapes of microtopography. Here, we discretise the catchment at a very high spatial resolution which explicitly and completely resolves microtopography. In this contribution, we firstly present the implementation strategy of SERGHEI which enables us to run these numerical experiments in feasible time (minutes per simulation). Results of these experiments are compared both domain-wide and in spatially-distributed and multiscale manner, by assessing hydrological signatures and hydrodynamic distributions in subdomains of increasing size. Our findings suggest the existence of some threshold spatial scale at which the effects of microtopography may become averaged, and therefore robustly parametrisable into subgrid modelling approaches or simple ponding models. We present some possible strategies that would make use of such a parametrisation.

Wednesday, September 16th
10:40 AM

MAELIA-OWM: agro-environmental and socio-economic modelling and assessment tool for territorial management of organic resources

Renaud Misslin, INRAE, FRANCE

10:40 AM - 11:00 AM

The use of organic wastes (OW) as fertilizers has various positive effects on ecosystem services such as soil fertility, climate regulation and soil biodiversity. OW use can also have negative effects such as increased nitrogen leaching and heavy metals accumulation. Moreover, OW can affect different aspects of a farming system (workload, yields, fertilizing costs). Optimizing OW management at local level requires an approach that would consider their characteristics (e.g. organic matter stability, fertilizing value), climate, soil and cropping system heterogeneities as well as the multiple feedback relationships that link the system components. OW territorial management could benefit from an Integrated Assessment and Modelling (IAM) tool allowing stakeholders to consider biophysical and socio-economic processes from field to territorial level. To reach this objective, we adapted the IAM MAELIA platform developed for modelling and simulating social-agro-ecosystems at local/regional level. MAELIA-OWM (Organic Wastes Management) provides solutions for assessing ecosystem services, economic and social impacts of scenarios regarding territorial OWM, agricultural activities, agro-environmental policies and climate changes. MAELIA is based on a set of validated models suitable for the simulation of various biophysical contexts. MAELIA-OWM is applied on the Versailles Plain, France (240 km²). This territory is characterized by a high availability but low usage of urban OW. MAELIA-OWM requires multiple sets of spatial data describing the territorial settings (e.g. climate grid, soil map, Land Parcel Identification System) and the agricultural practices of the study area. Different prospective scenarios (greater use of available OW, cover cropping) were compared to a baseline scenario (little use of OW, current practices) through a set of agro-environmental and socio-economic criteria (GHG emissions, carbon storage, nitrogen leaching, gross margins and workload). Actual developments of the model are dedicated to the implementation of an OW-chain model that will consider organic wastes production, transformation and transport.

11:40 AM

A simple approach to simulate regional grassland dynamics with a process-based crop model

Hella Ellen Ahrends, INRES, University of Bonn, Germany

11:40 AM - 12:00 PM

For simulating spatio-temporal dynamics of biotic and abiotic processes at the landscape scale and for quantifying water and nutrient consumption used for agricultural production, the simulation of grassland production is crucial. Most process-based crop models have been developed to simulate growth of major food crops, thus, the integration of other vegetation types such as grassland, is challenging. During the last decades several grassland models have been successfully developed. However, modeling approaches and model results differ, thus, integrating them in existing crop modeling frameworks may introduce non-systematic errors and restricts the comparability of simulated yields with those of other crops. We here present a simple approach combining an existing crop model (LINTUL5) with more detailed model routines within the modelling framework SIMPLACE for simulating grassland yields for North Rhine Westphalia (Germany). The model solution was validated by using biomass data reported between 2000 and 2008 at the state scale. The model solution is based on a light use efficiency approach to simulate vegetation growth and was coupled to dynamic soil water and nutrient models (SLIM, SoilCN) within the SIMPLACE modeling framework. Simple modifications of existing parameters and modeling routines and the integration of an additional routine for simulating grassland cuts and vegetation regrowth allowed for a realistic assessment of spatio-temporal biomass dynamics and annual grass yields. At the daily or seasonal scale, however, the use of simple rules to trigger cutting events restricts a reasonable simulation of the effects of cutting events on above- and belowground vegetation dynamics.

12:00 PM

Supporting policy development by determining a fishery’s Safe Operating Space (SOS)

Gideon Gal, Kinneret Limnological Lab., Israel

12:00 PM - 12:20 PM

Large changes to the Lake Kinneret ecosystem have occurred since the mid-1990’s including a dramatic crash of the commercial fisheries in 2008, notably the most commercially valuable Sarotherodon galilaeus. Aside from its high market value, S. galilaeus plays an important role in the food web in maintaining water quality. In an attempt to sustain the S. galilaeus population compensatory lake management strategies have been implemented and include stocking millions of S. galilaeus fingerlings every year. Large lake level fluctuations have however inhibited efficient fisheries management due to the impact of factors such as lake shore vegetation and substrate on S. galilaeus reproductive success and on additional ecosystem services. Given the lake level fluctuations and their impact on the ecosystem, questions have been raised regarding the correct policy required to sustain a viable commercial fisheries while accounting for variations in lake level and environmental conditions. Ecopath with Ecosim food-web models of the lake were developed, tested and implemented for the lake as tools for testing management strategies. We expanded on these models and developed an Ecospace model of the lake taking advantage of the recently implemented temporal-spatial framework to simulate the varying environmental conditions and multiple anthropogenic stressors acting on the food web and especially S. galilaeus. Based on scenario testing, we used the models to evaluate and define a management Safe Operating Space (SOS) for S. galilaeus in the lake by varying multiple stressors over a wide range of possible levels. The outcome of the multiple scenarios highlight the range of anthropogenic actions that will ensure a sustainable fisheries.

12:20 PM

Spanning the Pareto Frontier of Environmental Problems

Giorgio Guariso, DEIB, Politecnico di Milano, Italia

12:20 PM - 12:40 PM

Today, many complex multi-objective environmental problems are dealt with using genetic algorithms (GAs). They apply the development and adaptation mechanism of a natural population to a "numerical" population of solutions to optimize a fitness function. Such mechanisms, namely: selection, mutation, and cross-over, are all based on a form of random search. In this respect, GAs have to solve a multi-objective problem themselves since they must find a compromise between the breath of the search (to avoid being trapped into a local minimum) and its depth (to avoid a too rough approximation of the optimal solution). To deal with this dilemma, most algorithms use "elitism", which allows preserving some of the current best solutions in the successive generations. If the initial population is randomly selected, as in many GA packages, the elite may concentrate in a limited part of the Pareto frontier, where the objectives are continuous and monotonic with respect to the decision variables. This is not always the case in environmental problems so that this setting prevents a complete spanning of all the alternatives. A complete view of the frontier is possible if one, first, solves the single objective problems that correspond to the extremes of the Pareto boundary, and then uses such solutions as elite members of the initial population. The paper compares this approach with other more conventional initializations, fixing the same number of function evaluations. For this purpose, we use some classical test sets, the solution of which is known analytically, with two, three, and four objectives. Then we show the results of the proposed algorithm in the optimization of the releases of a real multi-reservoir system, contrasting its performances again with those of available packages. Using this example, we also briefly discuss the issues related to the classical performance measures in multi-objective optimization.

1:20 PM

Fireline Propagation: How well does the FARSITE model compare with observations from aerial images?

Ana Patricia Fernandes, CESAM, Portugal

1:20 PM - 1:40 PM

The study of fire progression requires a detailed characterization of several components, such as, the location of the ignition, the slope of the terrain, fuel properties and weather conditions. Usually, the fire progression is estimated using numerical models, but as the input variables in wildfires and even in experimental ones are highly uncertain, the estimation of the fireline progression is also uncertain. The main objective of this work was to study the fireline progression using two different approaches. The first approach is based on a modelling tool – FARSITE – which estimates fireline progression using topography, fuel characteristics and weather information. In the second approach, we use image processing techniques to extract data about the fireline progression from aerial images captured by an Unmanned Aerial Vehicle (UAV) equipped with an RGB camera. Both approaches were tested during a prescribed fire in the north of Portugal and the results were processed and evaluated using a geographical information system. The results show that FARSITE overestimates the burnt area in comparison with the data extracted from the aerial images. On the other hand, the segmentation of the fire in the aerial images was challenging, since the fireline was not visible in several framed due to heavy smoke together with the camera angles and flight limitations. This first attempt to compare both approaches also allowed to identify issues and improvements to be considered in the design of future experimental campaigns. These include guidelines to improve the spatiotemporal representation of the fireline propagation in both approaches, as well as to establish the first steps towards a new methodology to measure the fireline propagation during prescribed burns.

1:40 PM

What are the impacts of post-fire ash mobilization by wind erosion: physical and numerical modelling

Micael Rebelo, CESAM, CESAM, Dep. Environment and Planning, University of Aveiro, PT, Portugal

1:40 PM - 2:00 PM

Wildfires are gradually becoming a more recurrent phenomenon worldwide, impacting local environment and, ultimately, human health due to the production of smoke and ash during the combustion process. Although still largely unstudied, it is known that wind plays a major role in post-fire ash erosion and transportation. In this context, the main objective of this work is to develop a high-resolution CFD model capable to assess the influence of wind in ash mobilization. To accomplish this goal, the modelling setup composed by OpenFOAM and LIGGGHTS (LAMMPS Improved for General Granular and Granular Heat Transfer Simulations), coupling for air flow and particle simulation respectively, was used. Air flow was simulated using turbulence model K-Epsilon RNG and LIGGGHTS particles are modeled as soft spheres. To better simulate field ash, a comparative analysis between singular and multiple spheres particles was performed. Modelled results were compared with measured data from experiments conducted in wind tunnel, to evaluate the model performance. The experiments aim to assess the ash mass loss during a three-minute time interval, using Pine ash samples collected from Loriga wildfire of 24 August 2018. Both experimental and modeling results show a similar behavior, whereas most of the ash is eroded during the first seconds of the trial (around 20%). During the remaining time, if there are no high air velocity fluctuations, ash quantity remains stationary. This similarity between the two approaches is better verified in multi-spherical particles and in spherical particles with rolling friction, which are more representative of real ash shape. The application of CFD tools is particularly relevant to assess the effectiveness of mitigation measures, which may be difficult to quantify with wind tunnel experiments.

4:20 PM

Lifecycle Management for Models and Data Deployed as Integrated Services Supporting Farm Sustainability and Conservation Assessments

Jack Carlson, Colorado State University

4:20 PM - 4:40 PM

The Water Erosion Prediction Project (WEPP) and Wind Erosion Prediction System (WEPS) models simulate the effects of cropping systems and practices on water and wind soil erosion, sediment delivery, soil organic matter trend, tillage intensity, air particulate matter, and farm operation energy consumption. The two models, developed by the USDA Agricultural Research Service (ARS) and deployed as model web services by the Colorado State University (CSU) Object Modeling System Laboratory (OMSL), support multiple user communities operating from common soil, climate, and cropping system domain databases through supporting data services. User communities include Field-to-Market (FtM) - The Alliance for Sustainable Agriculture, their qualified data management partners (QMDPs), USDA Natural Resource Conservation Service (NRCS) county and state offices, as well as collaborating research and development project teams at CSU, ARS, and other organizations. CSU OMSL maintains and hosts the model and data services on the Cloud Services Integration Platform (CSIP) working with participating teams to maintain daily operational efficiency. CSIP currently supports several million WEPP/WEPS-related service requests annually, including those for FtM crop year sustainability assessments on 1.2 million acres annually. User segments increasingly reliant on these services requires a commitment to effective lifecycle management. In this presentation we summarize steps taken to ensure effective long-term support to the combined efforts of all invested: distributed development, DevOps, container-based deployments, data standards and stewardship, publish/subscribe, system activity monitoring, archiving, knowledge transfer, and continuity of operations planning.

Model approach for assessing and forecasting bathing water quality along the Flemish coast

Bart Verheyen, IMDC nv, Belgium

4:20 PM - 4:40 PM

In Belgium, the Flemish Agency for Care and Health and the Flanders Environment Agency are jointly responsible for monitoring the bathing water quality at beaches along the Flemish coast. Intense rainfall events can cause overloaded sewage systems to overflow into harbours and the sea. These polluted discharges can potentially impact the beach water quality along the Flemish Coast. As part of the water quality monitoring tools, both agencies are preparing to implement an operational forecasting and warning system to inform bathers about beach water quality along the coastline. This system has been setup to combine data and forecasts from different sources (meteorological forecasts, tidal predictions, operational data of overflow pumps,…), perform a water quality forecast, assess water quality criteria and provide warnings. The development of the operational forecasting system is supported by data analysis, monitoring campaigns and hydro dynamical and water quality modelling experiences from previous studies for specific coastal towns. As part of the tool, a performant hydrodynamic and water quality numerical model has been set up, calibrated and incorporated in the system. The aim of the final operational forecasting system will be to automatically run a forecast triggered by storm discharge events to assess the impact on the water quality and if the water quality is below the standard to inform bathers not to swim within a few hours’ notice. This paper will present the setup of the operational forecasting tool.

Thursday, September 17th
12:40 PM

Climate Change Impacts on Streamflow in Two Highly Abstracted English Catchments Under a High Emissions Scenario - Implications for Environmental Flow Protection

Cordula Wittekind, University of Leeds, University of Leeds - School of Geography, United Kingdom

12:40 PM - 1:00 PM

The spatio-temporal water availability for environmental flows can be severely affected by climate change. Robust water management approaches are urgently required to alleviate pressure on abstracted freshwater ecosystems. Modelling of the impacts of climate change on streamflow and water availability at the catchment scale has become indispensable in informing decision makers. We use the 12 new Hadley Centre climate model regional projections with a resolution of 12km for a high emissions scenario: Representative Concentration Pathway (RCP) 8.5. With the completely revised Soil and Water Assessment Tool (SWAT+) we simulate natural daily flows until the 2080s in two English catchments. While both catchments are similar in that they are affected by high rates of freshwater abstraction, they differ in their natural hydrological regimes and geographies. One is a wet coastal catchment with steep slopes while the other is a dry lowland catchment. SWAT+ performed well in simulating natural flows during the validation period (from 2009 to 2017) in both catchments. Under RCP 8.5. results show pronounced effects of climate change on future water availability, which we use as a basis to assess alternative environmental flow scenarios and abstraction licensing regimes which could be implemented to alleviate these effects. This study highlights the need for the translation of regional climate change information into locally explicit climate impacts for informed decision making in water management. It furthermore emphasises the urgency for developing and implementing environmental flow protection policies that incorporate climate change while being applicable to a wide range of hydrological regimes.

1:40 PM

Building an agent-based model of Land Use and Land Cover Changes to simulate changes between two land cover maps: lessons from an initialization step optimization

Romain Mejean, UMR 5602 GEODE CNRS, Université Toulouse 2 Jean Jaurès

1:40 PM - 2:00 PM

Spatial agent-based models (ABMs) are increasingly used to study land use and land cover changes (LUCC). This process-based approach allows formalizing the interactions between society and environment through the modelling of human decision-making processes regarding to the land use and its impact on regional land cover. Data-driven ABMs, which have emerged over the past decade, imply the integration of data for getting models more realistic, in a descriptive approach (KIDS). Most of the data are integrated at the initialization step, where agents of the simulation are created and their attributes values (including location) are initialized. We argue that this step is crucial, particularly because of the well-known phenomena of path dependence and dependence over initial conditions but also for getting a more comprehensive simulation model i.e. more realistic initial conditions of agents and landscape, a necessity for testing the model facing known empirical data using calibration. Thus, we introduce a LUCC-ABM of deforestation dynamics in Northern Ecuadorian Amazon for which initialization has been optimized. Our model workflow can be described as follow: first, we generate a spatially and socially structured population with a synthetic population generation library (GENSTAR) on the basis of census, cadastral and land cover data. Then, we generate an agricultural landscape by specifying a patchwork of farming activities in each cadastral parcel assigned to these agents according to their livelihood strategy, on the basis of field survey data in a sparse remote sensing data context. Lastly, we calibrate parameters related to agent decision-making processes in order to reproduce LUCC patterns due to human activities over eight years in Northern Ecuadorian Amazon as they appeared on land cover maps.

Towards automated provenance and other meta-data production for social simulation agent-based models

Douglas Salt, The James Hutton Institute

1:40 PM - 2:00 PM

Reproducibility is a known problem in social simulation. We have developed a framework for the automated collection of provenance and other metadata for agent-based modelling for social simulation experiments. The collection of this metadata will allow any experiment, and most importanty the results of any experiment using agent-based modelling to be reproduced by other researchers. In addition to provenance, the framework and tools allow the gathering of other metadata, which may help to elaborate or explain the intentions of the original researchers. For example annoatations to source documentation, such as orignal papers, unprocessed data sets, statistical methods, and visualisations. The semi-automation refers to the fact that much of the initial specification of the origin of the data has to be programmatically set, but subsequently all metadata is automatically produced. The intention of the authors is to automate as much of metadata production and recording as is possible. This is a cumulative process and the metadata from previous experiments and studies may be used as the basis for the production of metadata for subsequent experiments. We introduce a distinction between the fine-grained versus coarse-grained metadata. The finegrain provenance will allow the tracking of particular results and metrics. The coarse-grain metadata will shows the configuration of an expereiment, and is necessary for the reproduction of the experimental setup by other researchers. As mentioned, the recording of provenance and other metadata has two important dimensions of distinction. First is fine- versus coarse-grained metadata. Second is provenance versus workflow. Coarse-grained metadata describes how particular files come (or came) into being, or were (or could be) used to bring other files into being. This allows the initial setup of the experiments to be specified, and is most important for the purposes of reproducibility. The Fine-grained metadata describes specific values recorded in social simulation outputs. To make the distinction concrete, suppose a simulation produces a CSV file. The data within the CSV file are covered by fine-grained metadata, whilst the fact that the simulation produces the CSV file is coarse-grained. For the fine-grained data the rovenance metadata describes what actually happens (for example, run W of simulation X produced output file Y), whilst workflow metadata describes what could happen (e.g., simulation X produces an output file of type Z). That is, if a researcher is particularly interested in a certain metric or result, then this data can be used to establish precisley how that metric or result was arrived at. Such a schema produces a large number of tables, so to illustrate what data might be easily visualisable, the following sub-graphs would be available from the collection of such data: ANALYSIS: Graphs of the part of the fine grain pertaining only to analysis and visualisation; EXTERNAL: Graphs of the links to external ontologies; FINEGRAIN: Show graphs relating to fine grain metadata; FOLKSONOMY: Show graphs that allow user specified tagging; PROJECT: Show graphs relating to metadata about projects; PROV: Show graphs capturing provenance metadata; SERVICES: Show graphs pertaining to service-provision and matching requirements against specifications; WORKFLOW: Show graphs relating to workflow.

4:20 PM

New Concepts and Software for Expressing and Running Large Heterogeneous System Models

Derek Karssenberg, Faculty of Geosciences, Utrecht University, the Netherlands

4:20 PM - 4:40 PM

While a tremendous number of software tools exists for data storage and manipulation, the construction, execution, and maintenance of simulation models is still a daring undertaking. Key challenges are the representation of continuous fields as well as discrete objects (agents), the big data sets that need to be fed to models or assimilated, the increasing sophistication of process representations requiring larger computations, the need to use all compute units within and across compute nodes, and the requirement of reproducibility of models. We argue that existing model construction paradigms and frameworks do not sufficiently address these challenges mainly because most concepts and tools stem from the time when many of these challenges did not (yet) exist. We therefor propose the design and implementation of a new, open source, model building framework that relies on rigorously new approaches regarding the programming language to express models as well as regarding the computational backend. To enable coupled field-agent modelling, we introduce a concept that combines fields and agents in a single conceptual data model. Model builders then program models using map algebra like statements in a procedural programming language, avoiding the need for object orientation widely used in agent-based modelling frameworks. Map Algebra operations are provided that manipulate the value of phenomena (e.g. temperature, colour, pressure) and those that manipulate the location or shape of phenomena. Models programmed by modellers are executed by a computational back-end which relies on the new LUE physical data model that mirrors the conceptual data model by enabling both the storage of objects and continuous fields. A new computational engine addresses the need to execute any model on all compute units by using an Asynchronous Many-Task runtime environment. We show the current status of our work and invite the modelling community to join us in development.