This document describes useing ShEx for intuitive bridges between conventional clinical data representations and domain-specific ontologies that will be useful for knowledge/rule capture.
Most clinical data exchange denormalizes structured observations like blood pressure, APGAR, full blood count panel, etc into a constellation of observations.
This encodes the semantics of the observation structure in conventions of terminology codes, e.g. 75367002| Blood pressure |
, 271649006| Systolic blood pressure |
and 271650006 | Diastolic blood pressure |
.
Linkage between these are captured by some over-general predicates in the information model, e.g. fhir:related or rim:COMP - has component.
RIM resuses Act and ActRelationships to capture these structures, providing a richer vocabulary of relationships captured in ActRelationships type codes. The body site or device for a blood pressure measurement are attached to the blood pressure observation by type codes like Diagnostic processes like evidence and causality are captured in type codes EVID - provides evidence for and CAUS - is etiology for. FHIR has specialized relationships to capture some structural relationships like body site or device.
Even if we imposed a more complex information model for structured observations or treatment processes, there would always be some stuff for which there was no defined model. The semweb story "just invent some stuff and maybe it will get popular" isn't well suited to either the skills of the information wranglers or conventional paper-oriented clinical and legal processes.
Domain models will likely be more principled in their design and will definitely be more intuitive the physical models involving constellations of observations. A DAM is an example of a domain model which is tailored towards capturing the aspects which are required for analysis, e.g., aspects pertinent to workflows or business processes. A class and relationship hierarchy such as that implemented in ActRelationship type codes will be more simply expressed in an RDF ontology and will leverage existing RDF tooling for model and example verification.
Projects like CIMI engage clinicians in the development of intuitive domain models, e.g. a blood pressure and a structure with a measurement of a systolic and distolic pressure, and maybe some other stuff like posture or device. Here are a few CIMI models:
Ideally, development of little DAMs would include identifying all of the intersections between them, creating a more comprehensive DAM, and ontology of clinical artifacts. The body mass used in an estimated glomerular filtration rate is the same as the body mass used in a body mass index. A useful clinical ontology in RDF would capture that by reusing the same identifiers wherever the same concept was reused.
As the models are described as ShEx schemas, the mappings between them are captured as shared "variables" in a %map:{ %} extension. These variables are given full URLs which enables trivial disambiguation, as well as leveraging standard prefix conventions for easier lexical categorization.
Blood pressure measurements provide an intuitive example of measurements with associations. Each measurement includes (at least) a systolic and diastolic reading. Transforming multiple such measurements requires preserving the associations (see section below); a bag of systolic and a bag of diastolic readings is not sufficient.
To see ShExMap performing this (and more complex) mappings:
<BPunitsDAM> { :systolic { :value xsd:float %map:{ bp:sysVal %}, :units xsd:string %map:{ bp:sysUnits %} }, :diastolic { :value xsd:float %map:{ bp:diaVal %}, :units xsd:string %map:{ bp:diaUnits %} } }
<BPnormalizeDAM> { a (:CanonicalBloodPressure), :systolicBPmmHg xsd:float %map:{ cast(bp:sysVal, bp:sysUnits, "mmHg") %}, :diastolicBPmmHg xsd:float %map:{ cast(bp:diaUnits, bp:diaUnits, "mmHg") %} }
<BPfhir> { a (fhir:Observation)?, fhir:coding { fhir:code (sct:Blood_Pressure) }, fhir:related { fhir:type ("has-component"), fhir:target @<sysBP> }, fhir:related { fhir:type ("has-component"), fhir:target @<diaBP> } } <sysBP> { a (fhir:Observation)?, fhir:coding { fhir:code (sct:Systolic_Blood_Pressure) }, fhir:valueQuantity { a (fhir:Quantity)?, fhir:value xsd:float %map:{ bp:sysVal %}, fhir:units xsd:string %map:{ bp:sysUnits %} }, } <diaBP> { a (fhir:Observation)?, fhir:coding { fhir:code (sct:Diastolic_Blood_Pressure) }, fhir:valueQuantity { a (fhir:Quantity)?, fhir:value xsd:float %map:{ bp:diaVal %}, fhir:units xsd:string %map:{ bp:diaUnits %} }, }
This has been experimentally validated but not rigorously modelled.
generate
with the start shape of the target schema (say BP DAM) and a fresh bnodeS
and the current subject is B
.P
:
R
, create a fresh bnode Bchild
, assert (B, P, Bchild)
, invoke generate
with the shape R
and the subject Bchild
.
V
, assert (B, P, V)
Reallistic mapping use cases involve a mixture of unary and n-ary properties. For instance, a patient record may contain a sequence of vitals like blood pressure. Taking again FHIR as the target schema, the mapping from a single source record to multiple target records requires repeated uses of some properties for each instantiation of the repeated properties. The example below shows sample instance data in both textual and tree representations. The trees are essentially the validation result format defined in ShExJ Validation Results.
This can be viewed as mappig between two trees:
Materialization of :Patient-1234567 entailed mapping the unary properties from :p1234567.
This could be accomplished by ignoring n-ary properties like BloodPressureReading
or with some form of externally-supplied cut rule passed to the generate
function.
Materialization of :Patient-1234567.Obs1 and :Patient-1234567.Obs2 entailed repeated instantiating of the PatientID property. Possible rules for this mapping:
n
time where there are n
unique tree providing bindings, i.e. reading 1 and reading 2.n
times?) in cousins.In principle, shared variables are enough to associated rule heads with rule bodies. We need to test this with compositions in which a DAM uses variables bound in multiple physical models (e.g. a body mass in a BMI) and multiple DAMs used the same variables (e.g. BMI and EGFR).
Mapping directed, labeled graphs to familiar n-ary predicates for use in e.g. Quick (CQL).