Presenter/Author Information

Douglas Salt, The James Hutton Institute

Keywords

Metadata, provenance, agent-based modelling, social-simulation

Start Date

17-9-2020 1:40 PM

End Date

17-9-2020 2:00 PM

Abstract

Reproducibility is a known problem in social simulation. We have developed a framework for the automated collection of provenance and other metadata for agent-based modelling for social simulation experiments. The collection of this metadata will allow any experiment, and most importanty the results of any experiment using agent-based modelling to be reproduced by other researchers. In addition to provenance, the framework and tools allow the gathering of other metadata, which may help to elaborate or explain the intentions of the original researchers. For example annoatations to source documentation, such as orignal papers, unprocessed data sets, statistical methods, and visualisations. The semi-automation refers to the fact that much of the initial specification of the origin of the data has to be programmatically set, but subsequently all metadata is automatically produced. The intention of the authors is to automate as much of metadata production and recording as is possible. This is a cumulative process and the metadata from previous experiments and studies may be used as the basis for the production of metadata for subsequent experiments. We introduce a distinction between the fine-grained versus coarse-grained metadata. The finegrain provenance will allow the tracking of particular results and metrics. The coarse-grain metadata will shows the configuration of an expereiment, and is necessary for the reproduction of the experimental setup by other researchers. As mentioned, the recording of provenance and other metadata has two important dimensions of distinction. First is fine- versus coarse-grained metadata. Second is provenance versus workflow. Coarse-grained metadata describes how particular files come (or came) into being, or were (or could be) used to bring other files into being. This allows the initial setup of the experiments to be specified, and is most important for the purposes of reproducibility. The Fine-grained metadata describes specific values recorded in social simulation outputs. To make the distinction concrete, suppose a simulation produces a CSV file. The data within the CSV file are covered by fine-grained metadata, whilst the fact that the simulation produces the CSV file is coarse-grained. For the fine-grained data the rovenance metadata describes what actually happens (for example, run W of simulation X produced output file Y), whilst workflow metadata describes what could happen (e.g., simulation X produces an output file of type Z). That is, if a researcher is particularly interested in a certain metric or result, then this data can be used to establish precisley how that metric or result was arrived at. Such a schema produces a large number of tables, so to illustrate what data might be easily visualisable, the following sub-graphs would be available from the collection of such data: ANALYSIS: Graphs of the part of the fine grain pertaining only to analysis and visualisation; EXTERNAL: Graphs of the links to external ontologies; FINEGRAIN: Show graphs relating to fine grain metadata; FOLKSONOMY: Show graphs that allow user specified tagging; PROJECT: Show graphs relating to metadata about projects; PROV: Show graphs capturing provenance metadata; SERVICES: Show graphs pertaining to service-provision and matching requirements against specifications; WORKFLOW: Show graphs relating to workflow.

Stream and Session

false

COinS
 
Sep 17th, 1:40 PM Sep 17th, 2:00 PM

Towards automated provenance and other meta-data production for social simulation agent-based models

Reproducibility is a known problem in social simulation. We have developed a framework for the automated collection of provenance and other metadata for agent-based modelling for social simulation experiments. The collection of this metadata will allow any experiment, and most importanty the results of any experiment using agent-based modelling to be reproduced by other researchers. In addition to provenance, the framework and tools allow the gathering of other metadata, which may help to elaborate or explain the intentions of the original researchers. For example annoatations to source documentation, such as orignal papers, unprocessed data sets, statistical methods, and visualisations. The semi-automation refers to the fact that much of the initial specification of the origin of the data has to be programmatically set, but subsequently all metadata is automatically produced. The intention of the authors is to automate as much of metadata production and recording as is possible. This is a cumulative process and the metadata from previous experiments and studies may be used as the basis for the production of metadata for subsequent experiments. We introduce a distinction between the fine-grained versus coarse-grained metadata. The finegrain provenance will allow the tracking of particular results and metrics. The coarse-grain metadata will shows the configuration of an expereiment, and is necessary for the reproduction of the experimental setup by other researchers. As mentioned, the recording of provenance and other metadata has two important dimensions of distinction. First is fine- versus coarse-grained metadata. Second is provenance versus workflow. Coarse-grained metadata describes how particular files come (or came) into being, or were (or could be) used to bring other files into being. This allows the initial setup of the experiments to be specified, and is most important for the purposes of reproducibility. The Fine-grained metadata describes specific values recorded in social simulation outputs. To make the distinction concrete, suppose a simulation produces a CSV file. The data within the CSV file are covered by fine-grained metadata, whilst the fact that the simulation produces the CSV file is coarse-grained. For the fine-grained data the rovenance metadata describes what actually happens (for example, run W of simulation X produced output file Y), whilst workflow metadata describes what could happen (e.g., simulation X produces an output file of type Z). That is, if a researcher is particularly interested in a certain metric or result, then this data can be used to establish precisley how that metric or result was arrived at. Such a schema produces a large number of tables, so to illustrate what data might be easily visualisable, the following sub-graphs would be available from the collection of such data: ANALYSIS: Graphs of the part of the fine grain pertaining only to analysis and visualisation; EXTERNAL: Graphs of the links to external ontologies; FINEGRAIN: Show graphs relating to fine grain metadata; FOLKSONOMY: Show graphs that allow user specified tagging; PROJECT: Show graphs relating to metadata about projects; PROV: Show graphs capturing provenance metadata; SERVICES: Show graphs pertaining to service-provision and matching requirements against specifications; WORKFLOW: Show graphs relating to workflow.