contrast:

Introduction

This section loosely describes a mapping from UML models to RDF models in order to describe compliant RDF data.

The UML model includes Associations, Enumerations, atomic and complex Datatypes and Classes with Generalizations and Properties. For RDF, Classes and complex Datatypes are captured as Shapes, Properties are captured as TripleConstraints, and atomic Datatypes and Enumerations are expressed as NodeConstraints. In principle, OCL encodings of co-occurrence constraints can be expressed as TripleExpressions, but this must be tested on a corpus of UML that has such OCL.

Shapes describe data structures while typical intepretations of OWL classes are that they describe real-world objects. For example, a real-world Person object will always have two biological parents which are in turn entities of type Person. A data structure describing a person will deal with this infinite series (and with likely missing data) by making the biological parents "optional". OWL model using OWL Classes as OWL has been used to model this duality since before shapes languages existed and DDI exports a data structure-oriented ontology.

Creating the RDF model requires a mapping of Property names to RDF predicates. OMG’s ODM offers a conservative approach to this which constructs an RDF predicate from a the name of a containing Class, a Property name, and the value type of that Property. For instance, from the BRIDG, a BiologicEntityPart has a anatomicSiteLaterality which has a value which is a code. The ODM representation of this as an RDF predicate is bridg:BiologicEntityPart.anatomicSiteLateralityCode. Greater understanding of the domain of the UML model may reveal at all properties with the same name can be given the same identifier in RDF. For instance, DDI’s member property appears in many Classes with different value types, but the RDF identifier ddi:member captures them all.

Enumerations contain a list one or more constants. While these could be expressed in RDF instances as either literals or IRIs, the strong preference in RDF is to leverage IRIs and web architecture to provide unambiguous identifiers.

The following examples are taken from a subset of the DDI model forming a constellation around ConceptSystem.

parsing XMI

recursively walk the XMI capturing the UML model:

ElementImport can stand for a class, association or property. No PackageImport.

This produces a model object which is supplemented by:

RDF Model

For the corresping RDF model:

OWL Representation

For the OWL representation:

Status

Right now, the UML parser emits something half-way between a UML model and an RDF model. All of the markup for the OWL and ShEx is mixed in with the main program. Fixing this involves segregating the following modules and should take about 10 days of focused work. The product should be a set of stand-alone tools that can trivially executed on this and future versions of DDI XMI.

(goal) Architecture

Once completed, this pipeline will produce all of the documents. The libraries should be independent of DDI and ideally profit from wider distribution and tool contribution. The DDI-specific module should be small and easy to maintain and clearly enumerate the DDI-specific transformations.

Apart from the UML transformations, these can all be released as NPM packages. The DDI-specific UML transformations will import the NPM libraries. Note that the NPM conventions include a package.json file which capture version information for all imported libraries. If the some UML/RDF community extends the NPM libraries in ways that are not backward-compatible, the UML transforms script will continue to work as it; it will simply require a little work if we want to update it to use new libraries with new features.

Issues

  1. Canonical representation of datatypes is still uncertain.
  2. Nesting throws away annotations on nested classes.