Data Normalization Task Force

Trans*mation

The ability to change the structure / shape / representation of data

Transcription : Changing the serialization format of data - e.g. between XML and JSON, or XML and SQL
Transrepresentation : Changing the representation of data between two semantically equivalent, but structurally different forms.

Paraphrasing: The source and target forms are expressed using the same language. E.g. : transforming a pre- into a post-coordinated form
Translation : The source and target forms are expressed using different languages. E.g. converting FHIR to RIM, or a source system data schema to FHIR

Mapping / Transformation : Any change in structure that also involves a change in semantics
- Needs to quantify the impact/nature of the change in semantics (e.g. "broader than", "approximate match", etc..)
Normalizing Trans*mation : A Trans*mation process that targets a canonical representation of data, e.g. using standard terminologies and schemas

Clinical Semantization : A Normalizing Trans*mation process that outputs semantic clinical data models (i.e. schemas enhanced with a semantically rich canonical interpretation)

Conformance checking:
The ability to check the consistency of a piece of data with respect to some criteria, usually defined in terms of integrity constraints.
The outcome is a set, ideally empty, of violated criteria. (or, dually, the set of fulfilled criteria)

Syntactic validation : check whether the data is structurally correct, without involving the semantics of the data.
E.g. check that a patient’s date of birth record entry is expressed using a given date format, such as DD-MM-YYYY.
Semantic validation : check whether the data is semantically consistent with respect to the reference domain it is modelling.
E.g. disallow dates of birth which would make a patient’s age inconsistent with biological laws.

Verification: check whether a candidate "source" and a candidate "target" actually map to each other with respect to a given Trans*ation

- Trimming:
  The ability to remove (“trim”) any data element that does not satisfy a conformance check. This is the maximum subset of the original data structure that satisfies the integrity constraints.

Enrichment
The ability to create ("materialize") new data that is implied, but not explicitly asserted, as part of a data base. Can be considered a special kind of inference.

Classification : the ability to recognize a piece of data as (representing) an instance of some kind, or as being a member of some class, by virtue of its properties and relations. E.g. recognizing an Observation as a HgbObservation

Qualification: Classification based on the role an entity plays in relationship to another entity (e.g. post-op day 1 fluids, pre-op Hgb)

Contextualization : Classification based on a multi-variate well defined set of roles/relationships

Completion : the ability to infer the values of properties and attributes when not explicitly asserted. E.g. inferring the gender of a ob-gyn patient.

Feature extraction : Using NLP/ML techniques to identify and make known concepts explicit e.g. from a non-structured piece of data.

Correlation / “Linkage” : the ability to infer the existence of a relationship of some kind between (the entities represented by) two data elements, usually by virtue of their properties, or because of the existence of a third entity which acts as a relator.
E.g. inferring the “pre-op-ness” of an Observation with respect to a Surgery.
Production/Assertion : the ability to materialize (the representation of) entities that are not explicitly part of the data.
E.g. inferring the existence of a bleed in a post-surgical patient
E.g. estimating a patient’s CHADS2 score at a given point in time, and giving it an interpretation.

State Identification: qualitative (exposed, at-risk, elevated-risk, suspected, confirmed, treated, resolved, etc.) vs quantitative (such as tied to a risk and/ or severity score)
Measurements: process and (intermediate) outcomes often based on the aggregation of data
Calculations : Productions based on quantitative mathematical formulas

Scoring: Calculations whose output can be interpreted as a value on a scale, usually with an associated interpretation (e.g. risk scores)

(Security) Labeling/Tagging: Adding annotations (metadata) for the purpose of enforcing consent and other policies
Managing Imperfection : handling any type of (un)certainty, vagueness/fuzziness, imprecision, confidence, strength of evidence, and belief associated to a piece of data.

Services and Content for Data Enhancement

Capability		Specs / APIs	Knowledge
Trans*mation / Normalization
	Transcription		FHIR De/Serializers
	Paraphrasing	OMG MDMI / QVT	FHIR "Mapping Language" + fhir:StructureMap + fhir:ConceptMap RDF: ShEx + Graph Production XML : XSLT
	Translation	OMG MDMI / QVT
	Transformation	OMG MDMI / QVT
Validation
	Structural Conformance	$validate	fhir:StructureDefinition
	Semantic Conformance
	Trimming
Enrichment
	Classification		Ontology Languages
	Completion		Production (Rule) Languages fhir:DecisionSupportRule Predictive Modeling (e.g. PMML) Complex Event Processing (Indirect) : Expression Languages : HL7 CQL (Indirect) : Terms / Ontologies / Valuesets
	Linkage
	Inference
	Auditing		fhir:Audit + fhir:Provenance
	Uncertainty