Keywords
model and data integration, interoperability, standardized variable names, ontology generation
Start Date
26-6-2018 10:40 AM
End Date
26-6-2018 12:00 PM
Abstract
Every data set and computer model has its own internal vocabulary (i.e. names or labels) for referring to its input and/or output variables. It is therefore difficult to know, even for an expert, whether a variable stored/computed in a given digital resource is equivalent to one needed by another resource. Experts can typically figure this out through a process that may involve examining the equations that are used, being familiar with domain jargon, reading documentation (e.g. source code, manuals and papers) or talking to the developer of the resource. However, this is time-consuming, frustrating and inefficient. The only way to automate this semantic mediation task is with an accurate, one-time mapping of these internal names to variable names in a standardized vocabulary that can be utilized by machine (i.e. accessed via function calls in a program). This task of mapping internal variable names to standardized names is known as "semantic annotation". Once completed, it is possible to automatically perform "semantic alignment" every time that resource is selected for use in a workflow, allowing variables to be correctly passed between coupled resources.
We will describe efforts to semi-automatically generate standardized variable names for different domains by building on the foundational and rule-based principles of the Geoscience Standard Names ontology (geoscienceontology.org). Our initial focus will be on measurement concepts in the realms of agriculture, social science, economics, transportation networks and demographics. This work is funded by a project called MINT (Model INTegration) that is part of the World Modelers program.
Principle-based, Semi-automatic Ontology Generation to Support Cross-Domain Interoperability of Data Sets and Models
Every data set and computer model has its own internal vocabulary (i.e. names or labels) for referring to its input and/or output variables. It is therefore difficult to know, even for an expert, whether a variable stored/computed in a given digital resource is equivalent to one needed by another resource. Experts can typically figure this out through a process that may involve examining the equations that are used, being familiar with domain jargon, reading documentation (e.g. source code, manuals and papers) or talking to the developer of the resource. However, this is time-consuming, frustrating and inefficient. The only way to automate this semantic mediation task is with an accurate, one-time mapping of these internal names to variable names in a standardized vocabulary that can be utilized by machine (i.e. accessed via function calls in a program). This task of mapping internal variable names to standardized names is known as "semantic annotation". Once completed, it is possible to automatically perform "semantic alignment" every time that resource is selected for use in a workflow, allowing variables to be correctly passed between coupled resources.
We will describe efforts to semi-automatically generate standardized variable names for different domains by building on the foundational and rule-based principles of the Geoscience Standard Names ontology (geoscienceontology.org). Our initial focus will be on measurement concepts in the realms of agriculture, social science, economics, transportation networks and demographics. This work is funded by a project called MINT (Model INTegration) that is part of the World Modelers program.
Stream and Session
A4: Model Integration Frameworks: A Discussion of Typologies, Standards, Languages, and Platforms