Fun or No Fun - how serious is it?
Large language models are stirring up some phenomena in recent months. So inevitably I was playing ChatGPT too over last weekend, to probe whether it would be a complimentary to some BERT based "traditional" AI chatbots I was knocking up, or rather would it simply sweep them away.
A thought comes to mind while playing. By going slightly theoretical or philosophical, eventually interoperability standards such as HL7 and FHIR etc are kind of "languages", right? HL7 has its own grammar, rules, vocabulary and even dialects - every system speaks its own tone. That's why when one machine talks to another, they need a translator (e.g. a DTL transformation) to achieve mutual understanding.
So looking around, everything seems to be languages: coding is language: python, javascript and COS are all languages too. HL7, FHIR and even XML or JSON are all language except that they are more structured than natural languages, so should it be easier for GPT to pick up?
So can't we start by simply reusing GPT's pre-trained coding language model to emulate a DTL. Zero fine-tuning yet. Here below are initial results:
Test
Mode: Complete
Model: code-davinci-002
Temperature: 0
Max Length: 1989
Others: Default
Stop Sequences: ###
1. Turn HL7 ADT A01 to A05?
Input:
##### Translate this function from HL7 V2.8 ADT_A01 to HL7 V2.4 ADT_A05
### HL7 V2.8 ADT_A01
MSH|^~\&|ADT1|GOOD HEALTH HOSPITAL|GHH LAB, INC.|GOOD HEALTH HOSPITAL|198808181126|SECURITY|ADT^A01^ADT_A01|MSG00001|P|2.8||
EVN|A01|200708181123||
PID|1||PATID1234^5^M11^ADT1^MR^GOOD HEALTH HOSPITAL~123456789^^^USSSA^SS||EVERYMAN^ADAM^A^III||19610615|M||C|2222 HOME STREET^^GREENSBORO^NC^27401-1020|GL|(555) 555-2004|(555)555-2004||S||PATID12345001^2^M10^ADT1^AN^A|444333333|987654^NC|
NK1|1|NUCLEAR^NELDA^W|SPO^SPOUSE||||NK^NEXT OF KIN
PV1|1|I|2000^2012^01||||004777^ATTEND^AARON^A|||SUR||||ADM|A0|
### HL7 V2.4 ADT_A05
Output:
MSH|^~\&|ADT1|GOOD HEALTH HOSPITAL|GHH LAB, INC.|GOOD HEALTH HOSPITAL|198808181126|SECURITY|ADT^A05^ADT_A05|MSG00001|P|2.4||
EVN|A05|200708181123||
PID|1||PATID1234^5^M11^ADT1^MR^GOOD HEALTH HOSPITAL~123456789^^^USSSA^SS||EVERYMAN^ADAM^A^III||19610615|M||C|2222 HOME STREET^^GREENSBORO^NC^27401-1020|GL|(555) 555-2004|(555)555-2004||S||PATID12345001^2^M10^ADT1^AN^A|444333333|987654^NC|
NK1|1|NUCLEAR^NELDA^W|SPO^SPOUSE||||NK^NEXT OF KIN
PV1|1|I|2000^2012^01||||004777^ATTEND^AARON^A|||SUR||||ADM|A0|
Not exactly right but not that bad. It's able to pick up the vocabulary by nature, although not able to understand the doc type without learning yet.
2. OK, how about HL7 pipe & hat to XML
##### Translate this function from HL7 to XML
### HL7
MSH|^~\&|ADT1|GOOD HEALTH HOSPITAL|GHH LAB, INC.|GOOD HEALTH HOSPITAL|198808181126|SECURITY|ADT^A01^ADT_A01|MSG00001|P|2.8||
EVN|A01|200708181123||
PID|1||PATID1234^5^M11^ADT1^MR^GOOD HEALTH HOSPITAL~123456789^^^USSSA^SS||EVERYMAN^ADAM^A^III||19610615|M||C|2222 HOME STREET^^GREENSBORO^NC^27401-1020|GL|(555) 555-2004|(555)555-2004||S||PATID12345001^2^M10^ADT1^AN^A|444333333|987654^NC|
NK1|1|NUCLEAR^NELDA^W|SPO^SPOUSE||||NK^NEXT OF KIN
PV1|1|I|2000^2012^01||||004777^ATTEND^AARON^A|||SUR||||ADM|A0|
### XML
Output:
<?xml version="1.0" encoding="UTF-8"?>
<hl7:ADT_A01 xmlns:hl7="urn:hl7-org:v2xml">
<hl7:MSH>
<hl7:MSH.1>|</hl7:MSH.1>
<hl7:MSH.2>^~\&</hl7:MSH.2>
<hl7:MSH.3>
<hl7:HD.1>ADT1</hl7:HD.1>
</hl7:MSH.3>
<hl7:MSH.4>
<hl7:HD.1>GOOD HEALTH HOSPITAL</hl7:HD.1>
</hl7:MSH.4>
<hl7:MSH.5>
<hl7:HD.1>GHH LAB, INC.</hl7:HD.1>
</hl7:MSH.5>
<hl7:MSH.6>
<hl7:HD.1>GOOD HEALTH HOSPITAL</hl7:HD.1>
</hl7:MSH.6>
<hl7:MSH.7>
<hl7:TS.1>198808181126</hl7:TS.1>
</hl7:MSH.7>
<hl7:MSH.9>
<hl7:MSG.1>ADT</hl7:MSG.1>
<hl7:MSG.2>A01</hl7:MSG.2>
<hl7:MSG.3>ADT_A01</hl7:MSG.3>
</hl7:MSH.9>
<hl7:MSH.10>MSG00001</hl7:MSH.10>
<hl7:MSH.11>
<hl7:PT.1>P</hl7:PT.1>
</hl7:MSH.11>
<hl7:MSH.12>
<hl7:VID.1>2.8</hl7:VID.1>
</hl7:MSH.12>
</hl7:MSH>
<hl7:EVN>
<hl7:EVN.1>
<hl7:ID.1>A01</hl7:ID.1>
</hl7:EVN.1>
<hl7:EVN.2>
<hl7:TS.1>200708181123</hl7:TS.1>
</hl7:EVN.2>
</hl7:EVN>
<hl7:PID>
<hl7:PID.1>1</hl7:PID.1>
<hl7:PID.3>
<hl7:CX.1>PATID1234</hl7:CX.1>
<hl7:CX.4>
<hl7:HD.1>GOOD HEALTH HOSPITAL</hl7:HD.1>
</hl7:CX.4>
<hl7:CX.5>
<hl7:HD.1>ADT1</hl7:HD.1>
</hl7:CX.5>
<hl7:CX.6>MR</hl7:CX.6>
</hl7:PID.3>
<hl7:PID.5>
<hl7:XPN.1>EVERYMAN</hl7:XPN.1>
<hl7:XPN.2>ADAM</hl7:XPN.2>
<hl7:XPN.3>A</hl7:XPN.3>
<hl7:XPN.4>III</hl7:XPN.4>
</hl7:PID.5>
<hl7:PID.7>
<hl7:TS.1>19610615</hl7:TS.1>
</hl7:PID.7>
<hl7:PID.8>M</hl7:PID.8>
<hl7:PID.10>
<hl7:CE.1>C</hl7:CE.1>
</hl7:PID.10>
<hl7:PID.11>
<hl7:XAD.1>2222 HOME STREET</hl7:XAD.1>
<hl7:XAD.3>GREENSBORO</hl7:XAD.3>
<hl7:XAD.4>NC</hl7:XAD.4>
<hl7:XAD.5>27401-1020</hl7:XAD.5>
</hl7:PID.11>
<hl7:PID.13>
<hl7:XTN.1>(555) 555-2004</hl7:XTN.1>
</hl7:PID.13>
<hl7:PID.14>
<hl7:XTN.1>(555)555-2004</hl7:XTN.1>
</hl7:PID.14>
... ...
Now it gets a bit serious. Apparently it's prep-ed somehow already? It seems 1- It has no problem in parsing the HL7 messages (which was not easy for a machine - in the past it would need some power tools like ISC's virtual doc schema defined well) ; 2- it has no problem to pick up its data types.
3. HL7 pipe & hat to JSON?
Output:
{
"MSH": {
"MSH.1": "|",
"MSH.2": "^~\\&",
"MSH.3": {
"HD.1": "ADT1"
},
"MSH.4": {
"HD.1": "GOOD HEALTH HOSPITAL"
},
"MSH.5": {
"HD.1": "GHH LAB, INC."
},
"MSH.6": {
"HD.1": "GOOD HEALTH HOSPITAL"
},
"MSH.7": {
"TS.1": "198808181126"
},
"MSH.9": {
"MSG.1": "ADT",
"MSG.2": "A01",
"MSG.3": "ADT_A01"
},
"MSH.10": "MSG00001",
"MSH.11": {
"PT.1": "P"
},
"MSH.12": {
"VID.1": "2.8"
}
},
"EVN": {
"EVN.1": {
"ID.1": "A01"
},
"EVN.2": {
"TS.1": "200708181123"
}
},
"PID": {
"PID.1": "1",
"PID.3": [
{
"CX.1": "PATID1234",
"CX.2": "5",
"CX.3": "M11",
"CX.4": "ADT1",
"CX.5": "MR",
"CX.6": "GOOD HEALTH HOSPITAL"
},
{
"CX.1": "123456789"
},
{
"CX.4": "USSSA",
"CX.5": "SS"
}
],
"PID.5": [
{
"XPN.1": [
{
"FN.1": "EVERYMAN"
},
{
"ST.1": "ADAM"
},
{
"ST.2": "A"
},
{
"ST.3": "III"
}
]
}
],
"PID.7": {
"TS.1": "19610615"
},
"PID.8": {
"IS.1": "M"
},
"PID.10": {
"CE.1": "C"
},
"PID.11": [
{
"XAD.1": [
{
"SAD.1": "2222 HOME STREET"
},
{
"SAD.3": "GREENSBORO"
},
{
"SAD.4": "NC"
},
{
"SAD.5": "27401-1020"
}
]
},
{
"CE.1": "GL"
}
],
"PID.13": [
{
"XTN.1": "(555) 555-2004"
},
{
"XTN.1": "(555)555-2004"
}
],
"PID.16": {
"IS.1": "S"
},
"PID.19": [
{
"CX.1": "PATID12345001",
"CX.2": "2",
"CX.3": "M10",
"CX.4": "ADT1",
"CX.5": "AN",
"CX.6": "A"
},
{
"CX.1": "444333333"
},
{
"CX.1": "987654",
"CX.5": "NC"
}
]
},
"NK1": [
{
"NK1.1": "1",
"NK1.2": [
{
"XPN.1": [
{
"FN.1": "NUCLEAR"
},
{
"ST.1": "NELDA"
},
{
"ST.2": "W"
}
]
},
{
"CE.1": "SPO",
"CE.2": "SPOUSE"
}
]
}
],
"PV1": {
"PV1.1": "1",
"PV1.2": {
"IS.1": "I"
},
"PV1.3": [
{
"PL.1": "2000"
},
{
"PL.2": "2012"
},
{
"PL.3": "01"
}
],
"PV1.7": [
{
"XCN.1": "004777"
},
{
"XCN.2": [
{
"FN.1": "ATTEND"
},
{
"ST.1": "AARON"
},
{
"ST.2": "A"
}
]
}
],
"PV1.10": {
"IS.1": "SUR"
},
"PV1.19": {
"IS.1": "ADM"
},
"PV1.44": {
"IS.1": "A0"
}
}
}
Again, it seems that the parsing is fine - it has significant implications: if the parsing is done, then I can practically turn all sorts of HL7 dialects from various systems into parsed JSON, and run any translator in e.g. Python without an engine between them?
4. HL7 to FHIR
Output:
```json
{
"resourceType": "Bundle",
"type": "message",
"entry": [
{
"resource": {
"resourceType": "MessageHeader",
"eventCoding": {
"system": "http://hl7.org/fhir/message-events",
"code": "ADT_A01"
},
"destination": [
{
"endpoint": "http://localhost:8080/fhir/baseDstu3"
}
],
"source": {
"name": "ADT1",
"software": "GOOD HEALTH HOSPITAL",
"endpoint": "http://localhost:8080/fhir/baseDstu3"
},
"timestamp": "2018-08-18T11:26:00+00:00",
"data": [
{
"reference": "#evn"
},
{
"reference": "#pid"
},
{
"reference": "#nk1"
},
{
"reference": "#pv1"
}
]
},
"request": {
"method": "POST",
"url": "Patient"
}
},
{
"resource": {
"resourceType": "Patient",
"id": "evn",
"identifier": [
{
"system": "http://hl7.org/fhir/sid/us-ssn",
"value": "123456789"
}
],
"name": [
{
"family": "EVERYMAN",
"given": [
"ADAM",
"A",
"III"
]
}
],
"gender": "male",
"birthDate": "1961-06-15",
"address": [
{
"line": [
"2222 HOME STREET"
],
"city": "GREENSBORO",
"state": "NC",
"postalCode": "27401-1020"
}
],
"maritalStatus": {
"coding": [
{
"system": "http://hl7.org/fhir/v3/MaritalStatus",
"code": "M"
}
]
},
"telecom": [
{
"system": "phone",
"value": "(555) 555-2004"
},
{
"system": "phone",
"value": "(555)555-2004"
}
],
"contact": [
{
"relationship": [
{
"coding": [
{
"system": "http://hl7.org/fhir/v2/0131",
"code": "NK"
}
]
}
],
"name": {
"family": "NUCLEAR",
"given": [
"NEL ... ... ... ...
This is slightly impressive now, thinking of how much efforts I put into FHIR quite a few years ago. But I am not surprised any more, am I?
Any thoughts... I may need to take a breath to re-think of this kind of "language models", so I could stretch my imaginations a bit further? :)
Next "thing"
1. OK, it can play the roles of DTLs. But could it generate a Python or ISC COS style DTL or of any language of my choice - I meant the real code? I can't see a reason why not.
2. Certainty: traditional codings gave us certainty and accuracy - if some transformation is working, it would keep working with100% accuracy and certainty for all cases having been tested. Data science approaches sometimes don't have this certainty. How do we inject engineering certainty and medical quality into this "black box".