Keywords

LIAISE, Semantic Web, Integrated Modelling, Natural Language Processing, Text Classification

Location

Session A6: Semantics, Metadata and Ontologies of Natural Systems

Start Date

19-6-2014 10:40 AM

End Date

19-6-2014 12:20 PM

Abstract

The World Wide Web and related technologies are playing an increasing role in the field of Integrated Environmental Modelling (IEM). Model integration software frameworks are more and more becoming web-enabled. The technologies and standards of the Web are used to access and run simulation models remotely (known as the Web of models) and are considered for enabling interoperability across model integration frameworks. Furthermore there is a growing number of local and global initiatives to provide open access to environmental data (Web of data) that can potentially be used as input for the scientific models. The availability of descriptive information of sufficient quality about the scientific models and datasets that is semantically aligned is a necessity for efforts like the Web of models and the Web of data, and for their integration.

The EU FP7 project LIAISE--Linking Impact Assessment Instruments to Sustainability Expertise--is collecting and providing meta-descriptions on good practices, experts, scientific models and guidelines in its shared Impact Assessment toolbox called the LIAISE KIT. An international community of experts has been formed that provides and maintains these meta-descriptions. This knowledge base however is currently primarily targeted at a human audience and only accessible from a website. To ensure future usage initial steps are being taken in two small sub-projects of LIAISE to publish the content as part of the semantic web, and to examine the use of ontology and Natural Language Processing for metadata extraction. This has a research goal the (semi-)automated creation of relevant metadata from existing unstructured text sources such as scientific publications and websites. Preliminary results of these two sub-projects are described in this paper. Although showing promise some further research is still required, including a proof of concept implementation.

COinS
 
Jun 19th, 10:40 AM Jun 19th, 12:20 PM

Metadata extraction using semantic and Natural Language Processing techniques

Session A6: Semantics, Metadata and Ontologies of Natural Systems

The World Wide Web and related technologies are playing an increasing role in the field of Integrated Environmental Modelling (IEM). Model integration software frameworks are more and more becoming web-enabled. The technologies and standards of the Web are used to access and run simulation models remotely (known as the Web of models) and are considered for enabling interoperability across model integration frameworks. Furthermore there is a growing number of local and global initiatives to provide open access to environmental data (Web of data) that can potentially be used as input for the scientific models. The availability of descriptive information of sufficient quality about the scientific models and datasets that is semantically aligned is a necessity for efforts like the Web of models and the Web of data, and for their integration.

The EU FP7 project LIAISE--Linking Impact Assessment Instruments to Sustainability Expertise--is collecting and providing meta-descriptions on good practices, experts, scientific models and guidelines in its shared Impact Assessment toolbox called the LIAISE KIT. An international community of experts has been formed that provides and maintains these meta-descriptions. This knowledge base however is currently primarily targeted at a human audience and only accessible from a website. To ensure future usage initial steps are being taken in two small sub-projects of LIAISE to publish the content as part of the semantic web, and to examine the use of ontology and Natural Language Processing for metadata extraction. This has a research goal the (semi-)automated creation of relevant metadata from existing unstructured text sources such as scientific publications and websites. Preliminary results of these two sub-projects are described in this paper. Although showing promise some further research is still required, including a proof of concept implementation.