Data Normalization Task Force

 


  • Trans*mation

    The ability to change the structure / shape / representation of data 
    • Transcription : Changing the serialization format of data - e.g. between XML and JSON, or XML and SQL

    • Transrepresentation : Changing the representation of data between two semantically equivalent, but structurally different forms.

      • Paraphrasing: The source and target forms are expressed using the same language. E.g. : transforming a pre- into a post-coordinated form

      • Translation : The source and target forms are expressed using different languages. E.g. converting FHIR to RIM, or a source system data schema to FHIR

    • Mapping / Transformation : Any change in structure that also involves a change in semantics

      • Needs to quantify the impact/nature of the change in semantics (e.g. "broader than", "approximate match", etc..) 

    • Normalizing Trans*mation : A Trans*mation process that targets a canonical representation of data, e.g. using standard terminologies and schemas

      • Clinical Semantization : A Normalizing Trans*mation process that outputs semantic clinical data models (i.e. schemas enhanced with a semantically rich canonical interpretation)

         

  • Validation

    • Conformance checking:
      The ability to check the consistency of a piece of data with respect to some criteria, usually defined in terms of integrity constraints.
      The outcome is a set, ideally empty, of violated criteria. (or, dually, the set of fulfilled criteria)

      • Syntactic validation : check whether the data is structurally correct, without involving the semantics of the data.
        E.g. check that a patient’s date of birth record entry is expressed using a given date format, such as DD-MM-YYYY. 

      • Semantic validation : check whether the data is semantically consistent with respect to the reference domain it is modelling.
        E.g. disallow dates of birth which would make a patient’s age inconsistent with biological laws.
         

    • Verification: check whether a candidate "source" and a candidate "target" actually map to each other with respect to a given Trans*ation 

    • Trimming:
      The ability to remove (“trim”) any data element that does not satisfy a conformance check. This is the maximum subset of the original data structure that satisfies the integrity constraints.
       
  • Enrichment
    The ability to create ("materialize") new data that is implied, but not explicitly asserted, as part of a data base. Can be considered a special kind of inference.
     

    • Classification : the ability to recognize a piece of data as (representing) an instance of some kind, or as being a member of some class, by virtue of its properties and relations. E.g. recognizing an Observation as a HgbObservation

      • Qualification: Classification based on the role an entity plays in relationship to another entity (e.g. post-op day 1 fluids, pre-op Hgb)

        • Contextualization : Classification based on a multi-variate well defined set of roles/relationships

    • Completion : the ability to infer the values of properties and attributes when not explicitly asserted. E.g. inferring the gender of a ob-gyn patient.

      • Feature extraction : Using NLP/ML techniques to identify and make known concepts explicit e.g. from a non-structured piece of data.

    • Correlation / “Linkage” : the ability to infer the existence of a relationship of some kind between (the entities represented by) two data elements, usually by virtue of their properties, or because of the existence of a third entity which acts as a relator.
      E.g. inferring the “pre-op-ness” of an Observation with respect to a Surgery.

       

    • Production/Assertion : the ability to materialize (the representation of) entities that are not explicitly part of the data.   
      E.g. inferring the existence of a bleed in a post-surgical patient
      E.g. estimating a patient’s CHADS2 score at a given point in time, and giving it an interpretation.

      • State Identification: qualitative (exposed, at-risk, elevated-risk, suspected, confirmed, treated, resolved, etc.) vs quantitative (such as tied to a risk and/ or severity score)

      • Measurements: process and (intermediate) outcomes often based on the aggregation of data

      • Calculations : Productions based on quantitative mathematical formulas

        • Scoring: Calculations whose output can be interpreted as a value on a scale, usually with an associated interpretation (e.g. risk scores)

 

  • Auditing: Tracing the source of a piece of data, original or inferred

    • Provenance : who/when/where the data was gathered, asserted or inferred

    • Pedigree : correlating inferred data to the evidence on which it is based

  • (Security) Labeling/Tagging: Adding annotations (metadata) for the purpose of enforcing consent and other policies
  • Managing Imperfection : handling any type of (un)certainty, vagueness/fuzziness, imprecision, confidence, strength of evidence, and belief associated to a piece of data.

 

Services and Content for Data Enhancement

Capability Specs / APIsKnowledgeImplementations
Trans*mation / Normalization    
 Transcription FHIR De/Serializers 
 ParaphrasingOMG MDMI / QVT

FHIR "Mapping Language" + fhir:StructureMap
+ fhir:ConceptMap 
 

 

 

RDF: ShEx + Graph Production
XML : XSLT 

 
 TranslationOMG MDMI / QVT 
 TransformationOMG MDMI / QVT 
Validation    
 Structural Conformance$validate

fhir:StructureDefinition

 
 Semantic Conformance  
 Trimming  
Enrichment    
 Classification Ontology Languages 
 Completion 

Production (Rule) Languages
fhir:DecisionSupportRule 

Predictive Modeling (e.g. PMML)

Complex Event Processing

(Indirect) : Expression Languages : HL7 CQL

(Indirect) : Terms / Ontologies / Valuesets

 
 Linkage  
 Inference  
 Auditing fhir:Audit + fhir:Provenance 
 Uncertainty