发布新帖

查找

文章
· 三月 11, 2024 阅读大约需 8 分钟

Generating meaningful test data using Gemini

We all know that having a set of proper test data before deploying an application to production is crucial for ensuring its reliability and performance. It allows to simulate real-world scenarios and identify potential issues or bugs before they impact end-users. Moreover, testing with representative data sets allows to optimize performance, identify bottlenecks, and fine-tune algorithms or processes as needed. Ultimately, having a comprehensive set of test data helps to deliver a higher quality product, reducing the likelihood of post-production issues and enhancing the overall user experience. 

In this article, let's look at how one can use generative AI, namely Gemini by Google, to generate (hopefully) meaningful data for the properties of multiple objects. To do this, I will use the RESTful service to generate data in a JSON format and then use the received data to create objects.

This leads to an obvious question: why not use the methods from %Library.PopulateUtils to generate all the data? Well, the answer is quite obvious as well if you've seen the list of methods of the class - there aren't many methods that generate meaningful data.

So, let's get to it.

Since I'll be using the Gemini API, I will need to generate the API key first since I don't have it beforehand. To do this, just open aistudio.google.com/app/apikey and click on Create API key.

and create an API key in a new project

After this is done, you just need to write a REST client to get and transform data and come up with a query string to a Gemini AI. Easy peasy 😁

For the ease of this example, let's work with the following simple class

Class Restaurant.Dish Extends (%Persistent, %JSON.Adaptor)
{
Property Name As %String;
Property Description As %String(MAXLEN = 1000);
Property Category As %String;
Property Price As %Float;
Property Currency As %String;
Property Calories As %Integer;
}

In general, it would be really simple to use the built-in %Populate mechanism and be done with it. But in bigger projects you will get a lot of properties which are not so easily automatically populated with meaningful data.

Anyway, now that we have the class, let's think about the wording of a query to Gemini. Let's say we write the following query:

{"contents": [{
    "parts":[{
      "text": "Write a json object that contains a field Dish which is an array of 10 elements. Each element contains Name, Description, Category, Price, Currency, Calories of the Restaurant Dish."}]}]}

If we send this request to https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=APIKEY we will get something like:

 
Spoiler

Already not bad. Not bad at all! Now that I have the wording of my query, I need to generate it as automatically as possible, call it and process the result.

Next step - generating the query. Using the very useful article on how to get the list of properties of a class we can generate automatically most of the query.

ClassMethod GenerateClassDesc(classname As %String) As %String
{
    set cls=##class(%Dictionary.CompiledClass).%OpenId(classname,,.status)
    set x=cls.Properties
    set profprop = $lb()
    for i=3:1:x.Count() {
        set prop=x.GetAt(i)
        set $list(profprop, i-2) = prop.Name        
    }
    quit $listtostring(profprop, ", ")
}

ClassMethod GenerateQuery(qty As %Numeric) As %String [ Language = objectscript ]
{
    set classname = ..%ClassName(1)
    set str = "Write a json object that contains a field "_$piece(classname, ".", 2)_
        " which is an array of "_qty_" elements. Each element contains "_
        ..GenerateClassDesc(classname)_" of a "_$translate(classname, ".", " ")_". "
    quit str
}

When dealing with complex relationships between classes it may be easier to use the object constructor to link different objects together or to use a built-in mechanism of %Library.Ppulate.

Following step is to call the Gemini RESTful service and process the resulting JSON.

ClassMethod CallService() As %String
{
 Set request = ..GetLink()
 set query = "{""contents"": [{""parts"":[{""text"": """_..GenerateQuery(20)_"""}]}]}"
 do request.EntityBody.Write(query)
 set request.ContentType = "application/json"
 set sc = request.Post("v1beta/models/gemini-pro:generateContent?key=<YOUR KEY HERE>")
 if $$$ISOK(sc) {
    Set response = request.HttpResponse.Data.Read()	 
    set p = ##class(%DynamicObject).%FromJSON(response)
    set iter = p.candidates.%GetIterator()
    do iter.%GetNext(.key, .value, .type ) 
    set iter = value.content.parts.%GetIterator()
    do iter.%GetNext(.key, .value, .type )
    set obj = ##class(%DynamicObject).%FromJSON($Extract(value.text,8,*-3))
    
    set dishes = obj.Dish
    set iter = dishes.%GetIterator()
    while iter.%GetNext(.key, .value, .type ) {
        set dish = ##class(Restaurant.Dish).%New()
        set sc = dish.%JSONImport(value.%ToJSON())
        set sc = dish.%Save()
    }    
 }
}

Of course, since it's just an example, don't forget to add status checks where necessary.

Now, when I run it, I get a pretty impressive result in my database. Let's run a SQL query to see the data.

The description and category correspond to the name of the dish. Moreover, prices and calories look correct as well. Which means that I actually get a database, filled with reasonably real looking data. And the results of the queries that I'm going to run are going to resemble the real results.

Of course, a huge drawback of this approach is the necessity of writing a query to a generative AI and the fact that it takes time to generate the result. But the actual data may be worth it. Anyway, it is for you to decide 😉

 
P.S.

P.P.S. The first image is how Gemini imagines the "AI that writes a program to create test data" 😆

4 Comments
讨论 (4)3
登录或注册以继续
文章
· 三月 8, 2024 阅读大约需 3 分钟

IKO - Lessons Learned (Part 4 - The Storage Class)

The IKO will dynamically provision storage in the form of persistent volumes and pods will claim them via persistent volume claims.

But storage can come in different shapes and sizes. The blueprint to the details about the persistent volumes comes in the form of the storage class.

This raises the question: we've deployed the IrisCluster, and haven't specified a storage class yet. So what's going on?

You'll notice that with a simple

kubectl get storageclass

you'll find the storage classes that exist in your cluster. Note that storage classes are a cluster wide resource, not per namespace as other objects, like our pods and services.

You'll also notice that one of the storage classes is marked as default. This is the one that the IKO takes when we do not specify any. What if none are marked as default? In this case we have the following problem:

Persistent volumes are not able to be created, which in turn means persistent volume claims are not bound and therefore the pod is stuck in a pending state. It's like going to a restaurant and after looking at the menu telling the waiter/waitress that you'd like to order food, close the menu, hand it back to your server, and say thanks. We need to be more specific or our instructions are so vague that they mean nothing.

To solve this problem you could either set a default storage class in your cluster, or set the storage class name field in the CRD (this way you don't need to change what your default cluster storage class is in case you choose to use the non default storage class):

apiVersion: intersystems.com/v1alpha1
kind: IrisCluster
metadata:
  name: simple
spec:
  licenseKeySecret:
    #; to activate ISC license key
    name: iris-key-secret
  configSource:
    #; contains CSP-merge.ini, which is merged into IKO's
    #; auto-generated configuration.
    name: iris-cpf
  imagePullSecrets:
    - name: intersystems-pull-secret
  storageClassName: your-sc

  topology:
    data:
      image: containers.intersystems.com/intersystems/irishealth:2023.3
      compatibilityVersion: "2023.3"
      mirrored: true
      webgateway:
        image: containers.intersystems.com/intersystems/webgateway:2023.3
        type: apache
        replicas: 1
        applicationPaths:
          - /csp/sys
          - /csp/healthshare
          - /api/atelier
          - /csp/broker
          - /isc
          - /oauth2
          - /ui
        loginSecret:
           name: iris-webgateway-secret
           
    arbiter:
     image: containers.intersystems.com/intersystems/arbiter:2023.3
    webgateway:
      replicas: 1
      image: containers.intersystems.com/intersystems/webgateway:2023.3
      applicationPaths:
        #; All of the IRIS instance's system default applications.
        #; For Management Portal only, just use '/csp/sys'.
        #; To support other applications, please add them to this list.
        - /csp/sys
        - /csp/broker
        - /api
        - /isc
        - /oauth2
        - /ui
        - /csp/healthshare
      alternativeServers: LoadBalancing
      loginSecret:
        name: iris-webgateway-secret

  serviceTemplate:
    # ; to enable external IP addresses
    spec:
      type: LoadBalancer

Note that there are specific requirements for the storage class, as documented in the docs:

"Any storage class you define must include Kubernetes setting volumeBindingMode: WaitForFirstConsumerOpens for correct operation of the IKO."

Furthermore, I like to use allowVolumeExpansion: true.

Note that the provisioner of your storage class is platform specific.

The storage class pops up all over the CRD so remember to set it when you are customizing your storage for your cluster, in order to make sure you use the storage class that's right for you.

1 Comment
讨论 (1)1
登录或注册以继续
文章
· 三月 6, 2024 阅读大约需 9 分钟

Connecting to DynamoDB Using Embedded Python: A Tutorial for Using Boto3 and ObjectScript to Write to DynamoDB

Introduction

As the health interoperability landscape expands to include data exchange across on-premise as well as hosted solutions, we are seeing an increased need to integrate with services such as cloud storage. One of the most prolifically used and well supported tools is the NoSQL database DynamoDB (Dynamo), provided by Amazon Web Services (AWS).

4 Comments
讨论 (4)1
登录或注册以继续
文章
· 三月 6, 2024 阅读大约需 3 分钟

IKO - Lessons Learned (Part 3 - Services 101 and The Sidecars)

The IKO allows for sidecars. The idea behind them is to have direct access to a specific instance of IRIS. If we have mirrored data nodes, the web gateway will (correctly) only give us access to the primary node. But perhaps we need access to a specific instance. The sidecar is the solution.

Building on the example from the previous article, we introduce the sidecar by using a mirrored data node and of course arbiter.

apiVersion: intersystems.com/v1alpha1
kind: IrisCluster
metadata:
  name: simple
spec:
  licenseKeySecret:
    #; to activate ISC license key
    name: iris-key-secret
  configSource:
    #; contains CSP-merge.ini, which is merged into IKO's
    #; auto-generated configuration.
    name: iris-cpf
  imagePullSecrets:
    - name: intersystems-pull-secret

  topology:
    data:
      image: containers.intersystems.com/intersystems/irishealth:2023.3
      compatibilityVersion: "2023.3"
      mirrored: true
      webgateway:
        image: containers.intersystems.com/intersystems/webgateway:2023.3
        type: apache
        replicas: 1
        applicationPaths:
          - /csp/sys
          - /csp/healthshare
          - /api/atelier
          - /csp/broker
          - /isc
          - /oauth2
          - /ui
        loginSecret:
           name: iris-webgateway-secret
           
    arbiter:
     image: containers.intersystems.com/intersystems/arbiter:2023.3
    webgateway:
      replicas: 1
      image: containers.intersystems.com/intersystems/webgateway:2023.3
      applicationPaths:
        #; All of the IRIS instance's system default applications.
        #; For Management Portal only, just use '/csp/sys'.
        #; To support other applications, please add them to this list.
        - /csp/sys
        - /csp/broker
        - /api
        - /isc
        - /oauth2
        - /ui
        - /csp/healthshare
      alternativeServers: LoadBalancing
      loginSecret:
        name: iris-webgateway-secret

  serviceTemplate:
    # ; to enable external IP addresses
    spec:
      type: LoadBalancer

 

Notice how the sidecar is nearly identical to the 'maincar' webgateway. It just is placed within the data node. That's because it's a second container that sits in the pod alongside the IRIS image. This all sounds great. But how do we actually access it? The IKO nicely creates services for us, but for the sidecar that responsibility will fall on us.

So how do we expose this webgateway? With a service like this:

apiVersion: v1
kind: Service
metadata:
  name: sidecar-service
spec:
  ports:
  - name: http
    port: 81
    protocol: TCP
    targetPort: 80
  selector:
    intersystems.com/component: data
    intersystems.com/kind: IrisCluster
    intersystems.com/mirrorRole: backup
    intersystems.com/name: simple
    intersystems.com/role: iris
  type: LoadBalancer

Now our 'maincar' service always points at the primary and the sidecar at the backup. But we very well could have created a sidecar service to expose data-0-0 and one to expose data-0-1, regardless of which is the primary or backup. Services give the possibility of exposing any pod we want and targeting it by what you notice is the selector, which just identifies a pod (or multiple pods) by its labels.

We've barely scratched the surface on services and haven't even mentioned their more sophisticated partner, ingress. You can read up more about that here in the meantime.

In the next bite sized article we'll cover the storage class.

1 Comment
讨论 (1)1
登录或注册以继续
文章
· 三月 5, 2024 阅读大约需 2 分钟

Columnar vs. Row Storage with IRIS native

  • The idea of this package is to compare the performance of columnar storage    inside IRIS without wrapping it to some foreign platform that is not my world   
  • In addition, I do not want to measure network performance between 2 containers,     but inside a closed IRIS environment that I have fully under my control
  • Even the use of SMP or some other browser-based presentation has some influence that I want to avoid.
  • Measuring should be as close to the core as possible. So I flagged it NATIVE. Some people might feel it is ABORIGINAL.

How to use it

All tests are running in Namespace USER and are initiated
exclusively from the command prompt.

USER>do ^Demo
Test Columnar vs. Row Storage
=============================
     1 - Initialize Tables
     2 - Generate Data
     3 - Compare SELECT
     4 - Loop SELECT
     5 - Auto Loop
Select Function or * to exit :
  • 1 create/clear the tables. Package name A to be on top in search
  • 2 fills it with EXACTLY the same data  (INSERT --- SELECT)
  • 3 runs SELECT AVG(Amount) FROM A.??? WHERE Status = 'SENT'
  • 4 allows to add data between SELECT cycles
  • 5 does the same in a larger loop

Being curious I added also DemoB where row store is more advanced
using Bitmap Index and Bitslice Index. This was not so impressive.

Summary

The gain in speed is significant   
Data generated by option 5  provided the base of this EXCEL diagram.   
No surprises!

Special thanks to @Luis Angel Pérez Ramos 
 for the test data layout!
 
GitHub

2 Comments
讨论 (2)2
登录或注册以继续