ShExMap

This document describes useing ShEx for intuitive bridges between conventional clinical data representations and domain-specific ontologies that will be useful for knowledge/rule capture.

Generic Observations

Most clinical data exchange denormalizes structured observations like blood pressure, APGAR, full blood count panel, etc into a constellation of observations. This encodes the semantics of the observation structure in conventions of terminology codes, e.g. 75367002| Blood pressure | , 271649006| Systolic blood pressure | and 271650006 | Diastolic blood pressure |. Linkage between these are captured by some over-general predicates in the information model, e.g. fhir:related or rim:COMP - has component.

RIM resuses Act and ActRelationships to capture these structures, providing a richer vocabulary of relationships captured in ActRelationships type codes. The body site or device for a blood pressure measurement are attached to the blood pressure observation by type codes like Diagnostic processes like evidence and causality are captured in type codes EVID - provides evidence for and CAUS - is etiology for. FHIR has specialized relationships to capture some structural relationships like body site or device.

Even if we imposed a more complex information model for structured observations or treatment processes, there would always be some stuff for which there was no defined model. The semweb story "just invent some stuff and maybe it will get popular" isn't well suited to either the skills of the information wranglers or conventional paper-oriented clinical and legal processes.

Domain Models

Why do we need domain models?

Domain models will likely be more principled in their design and will definitely be more intuitive the physical models involving constellations of observations. A DAM is an example of a domain model which is tailored towards capturing the aspects which are required for analysis, e.g., aspects pertinent to workflows or business processes. A class and relationship hierarchy such as that implemented in ActRelationship type codes will be more simply expressed in an RDF ontology and will leverage existing RDF tooling for model and example verification.

Available DAMs

Projects like CIMI engage clinicians in the development of intuitive domain models, e.g. a blood pressure and a structure with a measurement of a systolic and distolic pressure, and maybe some other stuff like posture or device. Here are a few CIMI models:

DAM as Ontology

Ideally, development of little DAMs would include identifying all of the intersections between them, creating a more comprehensive DAM, and ontology of clinical artifacts. The body mass used in an estimated glomerular filtration rate is the same as the body mass used in a body mass index. A useful clinical ontology in RDF would capture that by reusing the same identifiers wherever the same concept was reused.

Mappings

Why do we need mappings?

Definitions for physical representations — presuming the DAM has intuitive and useful semantics mapping a physical model to a logical model provides clear guidelines for implementors who need to work with the corresponding physical models. These take the place of implementation guidelines or instance diagrams.
Duplication detection — if two domain models map to the same physical model, either the models are incomplete or they duplicate each other. This will be tempered by the fact that any models that share components will necessarily map to the same physical model.
Data/query transformation — mapping between a physical model and a logical model provides users of the logical model, for instance, someone writing clinical decisions support rules, access to working clinical data. Mappings between a logical model and two or more physical models enables mappings between those physical models.

Mapping mechanism

As the models are described as ShEx schemas, the mappings between them are captured as shared "variables" in a %map:{ %} extension. These variables are given full URLs which enables trivial disambiguation, as well as leveraging standard prefix conventions for easier lexical categorization.

Blood pressure example

Blood pressure measurements provide an intuitive example of measurements with associations. Each measurement includes (at least) a systolic and diastolic reading. Transforming multiple such measurements requires preserving the associations (see section below); a bag of systolic and a bag of diastolic readings is not sufficient.

To see ShExMap performing this (and more complex) mappings:

click the try it link (maybe open in another window).
On the left you will see an empty schema input, and on the right an empty data input.
click the button in the manifest below the schema.
This will populate the schema and make some data buttons available to the right of the manifest.
Select the data button.
This will populate the data and the QueryMap.
Click under the data seletor. This will validate the input data against the input schema.
Scroll way to the right with the horizontal scroll bar just above the validate button
This will reveal a three new text areas: bindings from validation, static bindings, target schema.
Click the button
This will use the bindings from validation to materialize the target schema.
The triples will replace the results at the bottom of the page.

BP DAM

BP units DAM

<BPunitsDAM> {
    :systolic {
        :value xsd:float %map:{ bp:sysVal %},
        :units xsd:string %map:{ bp:sysUnits %}
    },
    :diastolic {
        :value xsd:float %map:{ bp:diaVal %},
        :units xsd:string %map:{ bp:diaUnits %}
    }
}

BP normalized DAM

<BPnormalizeDAM> {
    a (:CanonicalBloodPressure),
    :systolicBPmmHg xsd:float %map:{ cast(bp:sysVal, bp:sysUnits, "mmHg") %},
    :diastolicBPmmHg xsd:float %map:{ cast(bp:diaUnits, bp:diaUnits, "mmHg") %}
}

BP FHIR try it

<BPfhir> {
    a (fhir:Observation)?,
    fhir:coding { fhir:code (sct:Blood_Pressure) },
    fhir:related { fhir:type ("has-component"), fhir:target @<sysBP> },
    fhir:related { fhir:type ("has-component"), fhir:target @<diaBP> }
}
<sysBP> {
    a (fhir:Observation)?,
    fhir:coding { fhir:code (sct:Systolic_Blood_Pressure) },
    fhir:valueQuantity {
        a (fhir:Quantity)?,
        fhir:value xsd:float %map:{ bp:sysVal %},
        fhir:units xsd:string %map:{ bp:sysUnits %}
    },
}
<diaBP> {
    a (fhir:Observation)?,
    fhir:coding { fhir:code (sct:Diastolic_Blood_Pressure) },
    fhir:valueQuantity {
        a (fhir:Quantity)?,
        fhir:value xsd:float %map:{ bp:diaVal %},
        fhir:units xsd:string %map:{ bp:diaUnits %}
    },
}

Mechanism

This has been experimentally validated but not rigorously modelled.

Validate instance document with respect to source schema (say BP FHIR)
produces variable bindings
Invoke generate with the start shape of the target schema (say BP DAM) and a fresh bnode
The shape is S and the current subject is B.
for each property P:
1. if the property is a reference to shape R, create a fresh bnode Bchild, assert (B, P, Bchild), invoke generate with the shape R and the subject Bchild.
2. if the element is a leaf node (scalar value)
  if the element has a variable and the variable is bound to V, assert (B, P, V)
The resulting graph of assertions should validate as the target schema.

Preserving associations

Reallistic mapping use cases involve a mixture of unary and n-ary properties. For instance, a patient record may contain a sequence of vitals like blood pressure. Taking again FHIR as the target schema, the mapping from a single source record to multiple target records requires repeated uses of some properties for each instantiation of the repeated properties. The example below shows sample instance data in both textual and tree representations. The trees are essentially the validation result format defined in ShExJ Validation Results.

This can be viewed as mappig between two trees:

patient record :p1234567 ⇨ :Patient-1234567
blood pressure reading 1 ⇨ :Patient-1234567.Obs1
blood pressure reading 2 ⇨ :Patient-1234567.Obs2

Materialization of :Patient-1234567 entailed mapping the unary properties from :p1234567. This could be accomplished by ignoring n-ary properties like BloodPressureReading or with some form of externally-supplied cut rule passed to the generate function.

Materialization of :Patient-1234567.Obs1 and :Patient-1234567.Obs2 entailed repeated instantiating of the PatientID property. Possible rules for this mapping:

Instantiate a shape n time where there are n unique tree providing bindings, i.e. reading 1 and reading 2.
Repeat variables bound exactly once (or n times?) in cousins.
If there's a mismatch in cardinalities of require attributes, reject.

Issues

glue between heads and bodies
In principle, shared variables are enough to associated rule heads with rule bodies. We need to test this with compositions in which a DAM uses variables bound in multiple physical models (e.g. a body mass in a BMI) and multiple DAMs used the same variables (e.g. BMI and EGFR).
n-ary predicates
Mapping directed, labeled graphs to familiar n-ary predicates for use in e.g. Quick (CQL).

Ideas from Claude Nanjo and Mohammad Hekmatnejad:
Executed in the RDF Pipeline project by Glenna Mayo, David Booth and Mike Carifio:
Implemented by: Eric Prud'hommeaux