发布新帖

Rechercher

文章
· 三月 28, 2023 阅读大约需 10 分钟

Perceived gaps to GPT assisted COS development automation

​Keywords: ChatGPT, COS, Lookup Table, IRIS, AI 

 

Purpose


Here is another quick note before we move on to GPT-4 assisted automation journey. Below are some "little" helps ChatGPT had already been offering,  here and there, during daily works. 

And what could be the perceived gaps, risks and traps to LLMs assisted automation, if you happen to explore this path too. I'd also love to hear anyone's use cases and experiences on this front too. 

 

Lookup tables


One of the simplest tasks could be Lookup tables.  Frequently we have to turn tedious CSV or Excel code tables into some Lookup tables in XML, manually, particularly there are blanks etc randomly distributed in the values across thousand records, and you had to tidy them quite a few times before go-live deadlines. 

Can ChatGPT help a bit? Here is a quick test.

Prompt:

You are a program to automatically convert a CSV file into a XML lookup table file.

The input is a comma delimited CSV file with 3 data columns :
Blood Science ,XCOM,CANCELLED/REJECTED SAMPLE
Blood Science ,ADX ,ADD ON TEST COMMENT
Blood Transfusion,DARA,ALERT FOR DARA/CAMELLIA
Blood Transfusion, BT1 ,REJECT SAMPLE UNLABELLED

The output should be a XML file in the form of:
<?xml version="1.0"?>
<lookupTable>
     <entry table="HIE.ICE.FilterOBR4" key="XCOM">Blood Science</entry>
     <entry table="HIE.ICE.FilterOBR4" key="ADX ">Blood Science</entry>
     <entry table="HIE.ICE.FilterOB
R4" key="DARA">Blood Transfusion</entry>
     <entry table="HIE.ICE.FilterOBR4" key="BT1">Blood Transfusion</entry>
</lookupTable>
Note: the sequence of the XML attribute in each root should be kept as they are


ChatGPT:

Here's a Python script that reads in the CSV file, processes it, and outputs the resulting XML file:

import csv

def convert_csv_to_xml(input_file, output_file):
    """
    Convert a CSV file to an XML lookup table file.

    Parameters:
    input_file (str): Path to the input CSV file.
    output_file (str): Path to the output XML file.

    Returns:
    int: 0 if the conversion was successful, 1 if there was an error.
    """
    try:
        # Create a dictionary to hold the lookup table data
        lookup_table = {}

        # Read in the CSV file and process the data
        with open(input_file, "r") as csvfile:
            csvreader = csv.reader(csvfile)
            for row in csvreader:
                category = row[0].strip()
                key = row[1].strip()
                value = row[2].strip()
                lookup_table[key] = category

        # Generate the XML file
        with open(output_file, "w") as xmlfile:
            xmlfile.write('<?xml version="1.0"?>\n')
            xmlfile.write('<lookupTable>\n')
            for key in lookup_table:
                xmlfile.write(f'<entry table="HIE.ICE.FilterOBR4" key="{key}">{lookup_table[key]}</entry>\n')
            xmlfile.write('</lookupTable>\n')

        return 0
    except Exception as e:
        print("Error:", e)
        return 1

This script first reads in the CSV file using the csv module and creates a dictionary to hold the lookup table data. It then processes each row of the CSV file and adds the relevant data to the dictionary.

Finally, the script generates the XML file by iterating through the dictionary and writing the appropriate XML tags and attributes to the output file. Note that the f string syntax is used to dynamically insert the key and category data into the XML tags.


I ran it briefly in a Jupyter notebook, it checked out. It also stripped out the blanks (was going to be a lot manual tidy-ups). It gave a tool handling lengthy CSVs with messy vlues, without being limited to token size. 

In this specific case, I'd say it increased the productivity about 20+ times over repeated manual approaches handling many messy code tables.


Now that it save me a bit time, so I was wondering whether it can generate a COS sample too to build Util class for future service BAUs etc. 


Prompt:  

Could you convert the above Python function to intersystems COS classmethod, by using correct COS specific function please

ChatGPT:

Certainly! Here's the converted code as an InterSystems ObjectScript class method:

ClassMethod ConvertCSVToXML(inputFile As %String, outputFile As %String) As %Integer
{
    try {
        // Create a dictionary to hold the lookup table data
        set lookupTable = {}

        // Read in the CSV file and process the data
        set csvReader = ##class(%File.CharacterStream).%New()
        do csvReader.LinkToFile(inputFile)
        while (csvReader.AtEnd = 0) {
            set row = csvReader.ReadLine()
            set category = $ZSTRIP($PIECE(row, ",", 1), "B")
            set key = $ZSTRIP($PIECE(row, ",", 2), "B")
            set value = $ZSTRIP($PIECE(row, ",", 3), "B")
            set lookupTable(key) = category
        }
        do csvReader.%Close()

        // Generate the XML file
        set xmlWriter = ##class(%File.CharacterStream).%New()
        do xmlWriter.LinkToFile(outputFile)
        do xmlWriter.WriteLine("<?xml version=""1.0""?>")
        do xmlWriter.WriteLine("<lookupTable>")
        set keys = lookupTable.%GetKeys()
        while ('keys.%AtEnd) {
            set key = keys.%GetNext()
            set category = lookupTable(key)
            do xmlWriter.WriteLine("<entry table=""HIE.ICE.FilterOBR4"" key="""_key_""">"_category_"</entry>")
        }
        do xmlWriter.WriteLine("</lookupTable>")
        do xmlWriter.%Close()

        quit 0
    }
    catch (ex) {
        Write "Error: "_ex.DisplayString(), !
        quit 1
    }
}

In InterSystems ObjectScript, we can use the %File.CharacterStream class to read and write files. We use the LinkToFile() method to open the input and output files, and then use the ReadLine() and WriteLine() methods to read and write data to the files.

To strip leading and trailing spaces from each field, we use the $ZSTRIP() function with the "B" flag.

We also use the %GetKeys() method of the lookupTable object to iterate over the keys in the dictionary.

Finally, the method returns a status code of 0 if the conversion was successful, or 1 if there was an error.


It didn't give an accurate tool that can check out as straightaway as Python.  %File.CharacterStream should be  %Stream.FileCharacter,  stripping blanks should use "<>W", and looping through globals should use $ORDER etc etc. 

Prompts can be improved, but I am particularly not good at asking for favors, so I rushed up a few tweaks along the lines. 

 

ClassMethod ConvertCSVToXML2(inputFile As %String, outputFile As %String) As %Integer
{
  
  s tSC = $$$OK
  try {
		set ^lookupTable = ""
        // Read in the CSV file and process the data
        set csvReader = ##class(%Stream.FileCharacter).%New()
        set tSC =  csvReader.LinkToFile(inputFile)
        while (csvReader.AtEnd = 0) {
            set row = csvReader.ReadLine()
            set category = $ZSTRIP($PIECE(row, ",", 1), "<>W")
            set key = $ZSTRIP($PIECE(row, ",", 2), "<>W")
            set value = $ZSTRIP($PIECE(row, ",", 3), "<>W")
            set ^lookupTable(key) = category
        }
        //zw ^lookupTable
        do csvReader.%Close()
        
        // Generate the XML file
        set xmlWriter = ##class(%Stream.FileCharacter).%New()
        set tSC = xmlWriter.LinkToFile(outputFile)
        set tSC = xmlWriter.WriteLine("<?xml version=""1.0""?>")
        set tSC = xmlWriter.WriteLine("<lookupTable>")
        set key = $O(^lookupTable("")) //lookupTable.%GetKeys()
        while (key '= "") {
	        //w key
            set category = $GET(^lookupTable(key))
            w !,key, " ", category
            set tSC =  xmlWriter.WriteLine("<entry table=""HIE.ICE.FilterOBR4"" key="""_key_""">"_category_"</entry>")
            set key = $O(^lookupTable(key))
        }
        set tSC = xmlWriter.WriteLine("</lookupTable>")
   		set tSC = xmlWriter.%Save("</lookupTable>")
        set tSC = xmlWriter.%Close()
  }
    catch (ex) {
        Write "Error: "_ex.DisplayString(), !
        s tSC = ex
    }
   return tSC
}

 

So what are the perceived gaps here?  

So some potential implications came into my mind would be:

1. Productivity:  The productivity reward would depend on the COS proficiency of the developer.  This tool would further amplify any advantages in coding proficiency.  

2. Gaps: My wondering would be:  how could we drive up its accuracy by few-shot learning via prompts , with or even without fine-tuning?  If you happen to explore this front across LLMs - I'd love to hear your thoughts, or dreams.  
 

 

Unit Tests Auto-generations
 

By talking about development, we can't get away from testing.  

Things seem to be changing now. Very "simple" looking, sleek tools leveraging the power of GPTs, such as RubberDuck, mushroomed out to help.

So I tried RubberDuck extension in VSCode, then set it up with my OpenAI API key.

Then I open the Python function as mentioned above in VSCode, as below:

Then I  selected click "Generate Unit Test ..." menu, and it would auto-generate some unit test coverage in a few seconds, at least 100+ times quicker than typing. It gives a quick place holder for tweaks.

 

 

Can we do such for COS code too, even though neither RubberDuck nor ChatGPT truly understood COS yet (and it's not LLM's fault):

And it generated these unit test placeholder without understanding COS much yet - I'll leave its gaps open for advice for now, and particularly, how or will the gaps need to be addressed with a sense of purpose?

It would need some tweaks, but by harvesting ChatGPT, it seems now able to auto generate codes, analyse codes, auto-comment the code, and auto-generate unit test place holders for you, in today's reality. (Well, sort of, depending on what coding language we are using, and what we really want it to do next).

 

Conclusion?


Again, I don't have quick conclusions, since I am not able to explore its boundaries and traps much yet.  

 

Math foundation??


Maybe one day, just as the concept of "entropy" in information theory was cast in 1948, another math genius might simply come out to enlighten us with another concept to simply quantify the "linguistic" gaps and distances among all sorts of languages, be it human or machine languages. Until then we would not really know the actual potential, limits. risks and traps of these LLMs. Nonetheless it doesn't seem to stop LLMs leap forward on monthly or weekly basis. 

 

Other use cases??


Analytics dashboards driven by human natural language queries:  A few weeks ago I was attempting some R&D for an innovation project, and by chance I noticed another use case:   it "can" turn clinical queries in human language into a machine query in SQL/MDX.  Well, kind of, without any fine-tuning yet by today. It seems to start to bridge the abyss between human languages and machine tones, to a degree?   

It would not be difficult to imagine such a scenario:  that a clinician just types in a clinical query into a chat in her natural language, and automatically a clinical dashboard is generated, highlighting those patients who might have missed the early detection of heart failure, and grouping and ranking them across regions, gender, and age groups. The differences in care quality would be highlighted in minutes instead of months of engineering efforts. 

 

And certainly, personalised AI care assistant.  It seemed to be so remote and so complex last year this time, and with GPT4, it rapidly becomes a reality, it seems.  Theoretically, nothing would stop GPT4 alike LLMs looking into my care records, just bits and pieces of structured and semi-structure data (like Lab tests and my medicines), unstructured data (my clinical notes and reports etc) and image data (X-Rays, CTs and MRIs), to start to make the best sense of it, and able to coordinator my cares and appointments in the coming soon.  


Disclaimer 
 

Sorry to state the obvious, but please note that the purpose of this post is not about how can we do XML Lookup tables quickly, manually or not, and actually I am pretty good at Excels and Notepad++ too. Instead, it's meant to get into touch with minds who'd like to share use cases, reflections, challenges, risks and experiences on the journey to LLM etc assisted automations

And LLMs' power came from developers, from every public repository and posting everybody made in public forums. GPTs are not born out of air. This is not being fully appreciated and acknowledged yet in recent months.

There are risks to humanity by AGI at this speed, but personally I felt slightly lucky, relieved and excused since  we are in healthcare services. 

Other routine risks include data privacy compliances per HIPPA, GDPR and DPAs as usual.  

3 Comments
讨论 (3)1
登录或注册以继续
公告
· 三月 24, 2023

[Video] Git Source Control for InterSystems IRIS Interoperability with Docker and VSCode

Hi Developers,

Often we create and edit InterSystems IRIS Interoperability solutions via a set of UI tools that is provided with IRIS. But it is sometimes difficult to setup the development environment to handle changes we make in the UI to source control.  

This video illustrates how git-source-control helps with source control Interoperability components while changing it in the UI.

⏯ Git Source Control for InterSystems IRIS Interoperability with Docker and VSCode

Add these two lines in your iris.script during docker build:

zpm "install git-source-control"
do ##class(%Studio.SourceControl.Interface).SourceControlClassSet("SourceControl.Git.Extension")

And Interoperability UI components will start working with git. 

Example application.

讨论 (0)2
登录或注册以继续
公告
· 三月 23, 2023

InterSystems Global Summit 2023 !

Nous sommes ravis d'inviter tous nos clients, partenaires, développeurs et membres de la communauté à notre InterSystems Global Summit 2023 en personne !

Notre conférence des utilisateurs du Global Summit est votre opportunité de vous connecter avec des développeurs de produits pionniers, d'autres utilisateurs poussant notre technologie vers de nouvelles frontières et les personnes dont la pensée originale fait vibrer nos universités et nos salles de conseil. Le tout dans le même espace. Et les inscriptions sont désormais ouvertes !

➡️ InterSystems Global Summit 2023

🗓 Dates : 4-7 juin 2023

📍 Lieu : Diplomat Beach Resort, Hollywood, Floride, États-Unis


Rejoignez-nous cette année pour découvrir comment des clients comme vous utilisent notre technologie pour innover et quelles tendances affectent nos innovations futures, y compris les produits et les offres de produits nouveaux et améliorés.

Voici un petit aperçu de l'ordre du jour.

Dimanche 4 juin

Sortie de golf ou activités sociales du matin
Retrait des badges
Bootcamp technologique
Examens de certification
Rencontre des femmes
Réception de bienvenue

Lundi 5 juin

Bienvenue et Keynotes
Séances en petits groupes
Conférence sur le leadership des soins de santé
Pavillon des partenaires
Réunions 1:1
Examens de certification
Groupes de discussion
Échange technologique
Séances d'affinité
Boissons et démos en soirée

Mardi 6 juin

Keynotes 
Séances en petits groupes
Pavillon des partenaires
Réunions 1:1
Examens de certification
Groupes de discussion
Échange technologique
Séances d'affinité
Événement social en soirée

Mercredi 7 juin

Keynotes
Séances en petits groupes
Pavillon des partenaires
Réunions 1:1
Examens de certification
Groupes de discussion
Échange technologique
Réception d'adieu

Pour plus d'informations sur l'ordre du jour, rendez-vous ici.

Nous avons hâte de vous voir à l'InterSystems Global Summit 2023 !

讨论 (0)1
登录或注册以继续
请注意,此帖子已过时。
公告
· 三月 20, 2023

Sneak Peek (at least for this week)

 Hi Community:

As you may know, I am the person who does the program and scheduling for our Global Summit sessions, so I can give you a sneak peek at our agenda before it is published.

I'm happy to report that we will have two special, four-hour workshops on Sunday, June 4, which is the day you arrive to register in Hollywood, Florida.

One is the return of the standing-room only 2022 workshop with our FHIR guru, Russ Leftwich - "Getting to Know FHIR: The Best Explanation of FHIR They've Ever Heard." Needless to say, we've reserved a bigger room.

Plus, we are planning eight related breakout sessions:

  1. "FHIR to IntegratedML: Can You Get There from Here?'
  2. "Building a FHIR Façade" 
  3. "How to Customize Your FHIR Server"
  4. "Clinical Alerts and Notifications on FHIR: Putting the Healthcare Action Engine into the Workflow"
  5. "Performing Advanced FHIR Validation"
  6. "Clinical Research Data Pipeline using FHIR and OMOP"
  7. "FHIR in the Cloud"
  8. "Introducing Bulk FHIR Export"

Our second workshop in an introduction to containers -- "Working with Containers and InterSystems Technology" - led by Derek Robinson from our online learning team. It will prepare you for these two related sessions

  1. "The Container Lifecycle: When is Your App Ready to Accept Work?"
  2. "Can you Autoscale This?  Lessons from the Field on Kubernetes"

Register TODAY for Global Summit 2023 to take advantage of Early Bird Prices. (You will be able to register for sessions, including the workshops, soon.)

讨论 (0)1
登录或注册以继续
文章
· 三月 16, 2023 阅读大约需 6 分钟

Kinds of properties in IRIS

InterSystems IRIS has quite a few different kinds properties. Let’s put them in order so that they make better sense.

First of all, I would divide them into categories:

  • Atomic or simple properties (all those %String, %Integer, %Data and other system or user datatypes)
  • References to stored objects
  • Built-in objects
  • Streams (both binary and character)
  • Collections (which are divided into arrays and lists)
  • Relationships (one-many and parent-children)

Some of these kinds of properties are quite straightforward. For example, atomic properties:

Property Name As %Name;
Property DoB As %Date
Property Age As %Integer

They are also easily created using Studio Wizard:

The concepts of references to stored objects and built-in objects are also quite easy to grasp. Basically, if the class of an object you’re trying to use as a property extends %Persistent, then it’s a reference to a stored object. If the related class extends %SerialObject – then it’s a built-in object, because such objects can’t be stored themselves, only inside other objects. And in a class definition they look exactly the same:

Property Human as Sample.Human;

To create this kind of property, on the second step of the Wizard in Studio enter the name of the class:

Streams are also quite easy to explain. You have a big chunk of unstructured data, be it binary or character, and here is your stream. Can be anything – audio, video, document, text etc. Depending on what type of data you want to store, you may choose to use Binary or Character Stream. Another defining characteristic of a stream is the place where you want to store it – on disk (takes less space but accessible from your OS unless you set up access rules for the folder) or in the database (secure, no unauthorized access, takes more space). To work with streams, use classes from package %Stream. For example:

Property Log As %Stream.GlobalCharacter;
Property Instruction As %Stream.FileCharacter(LOCATION = "D:\Temp");
Property Doc As %Stream.FileBinary;
Property Audio As %Stream.GlobalBinary;

In this case input the classname of the stream you wish to use:

Then there are two types of collections:

  • Lists – a collection where each element is defined by its position in the collection
Property FavoriteColors as list of %String;
  • Arrays – key-value pairs, where value is defined by its key that is set by a user
Property Contacts as array of %String;

Both collections can be of simple types or of objects. And when we’re talking about a collection of objects, the objects inside collection aren’t “aware” that they are inside any collection. Meaning that it’s a one-way link.

When working with arrays, it’s necessary to remember that both key and value should be new and useful piece of info. There is no point of making an integer key that will imitate a position in a list. In the example above, key of the Contacts array is a name of the type of the contact, e.g. “Facebook”, “LinkedIn”, “Twitter” etc and the value is a link to a personal page.

Under the hood, when working with a list, you’re working with the class %Collection.ListOfDT for a list of simple types, %Collection.ListOfObj when working with a list of objects and %Collection.ListOfStream when dealing with a list of streams.

The same is true for arrays. %Collection.ArrayOfDT is used when working with an array of simple datatypes, %Collection.ArrayOfObj – when working with an array of objects and %Collection.ArrayOfStream – when working with an array of streams.

For the collection on the second step of the Wizard in Studio choose the second option “The collection of type” 

and then specify the type:

And probably the most challenging kind of the property for people who switch from relational databases – relationship. The relationship is a two-way one-to-many link between two stored objects. In relational DBMS people are taught that they need an additional table to store the foreign keys to two other tables to realize one-to-many link of independent entities. For example:

table Invoice – table Product – table Invoice_Product

There is no need for an additional table/class to do this in IRIS. If there’s no need to often query the information about all the invoices in which the exact Product is listed you can make products as an array or list in an invoice. In this case you will have to manually ascertain logical integrity so that your invoice doesn’t reference products that are no longer in a DB.

To automatically check for the logical integrity of data in this case you can use a relationship between these two classes: Invoice, Product. There are two types of relationships:

  • Parent-children – it’s a dependent link between objects of two different persistent classes. It means that the dependent object can’t exist without the main object. For example, let’s say that a chapter in a book can’t exist without a book. This will be an example of parent-children relationship, where a book is a main object and chapter is a dependent object and if we delete a book all the chapters from this book will be deleted as well.
  • One-many – it’s an independent link between objects of one or two persistent classes. It means that both objects can exist and make sense one without the other and if you try to delete the aggregating object, you’ll get an error saying that you first have to unlink all the linked objects. If you try to delete the linked objects, they will disappear from the aggregating object. Let’s take our invoice and products as an example. We have many products in an invoice. If we try to delete an invoice, we first need to delete all the products from our invoice. If we delete a product, it will disappear from the invoice.

Since it’s a two-way link you need to define the properties in both classes. For example in a class Sample.Invoice you will have the definition:

Relationship Products As Sample.Product [ Cardinality = many, Inverse = Invoice ];


Note that the property is called Relationship and that it has two characteristics:

Cardinality = manymeaning that there are links to many objects inside this property

Inverse = Invoicethis is the name of the property on the other side of the relationship

At the same time in the other class (or it can be in the same class for one-many relationship) there should be the mirror property:

Relationship Invoice As Sample.Invoice [ Cardinality = one, Inverse = Products ];

Here cardinality “one” means that in this property there is a link only to one object.

To create a relationship using Wizard in Studio just choose on the second step Relationship:

Then choose the correct cardinality for the property in the class and fill in the name of the related class and property in it:

For the example with the book with chapters the properties would look as follows.

In a class Sample.Book:

Relationship Chapters As Sample.Chapter [ Cardinality = children, Inverse = Book ];

In a class Sample.Chapter:

Relationship Book As Sample.Book [ Cardinality = parent, Inverse = Chapters ];

This is the end of a brief overview of different kinds of properties present in IRIS. Hope it makes it clearer.

4 Comments
讨论 (4)2
登录或注册以继续