发布新帖

查找

问题
· 五月 15, 2024

FHIR Profile validation Java issues

Hello Community,

I'm trying to execute the FHIR R4 profile validation and got the below error. How to fix this below error

http://localhost:52773/csp/healthshare/learn1/fhir/r4/Patient/$validate?profile=http://examples.com/fhir/StructureDefinition/aktestpatient|4.0.1

3 Comments
讨论 (3)2
登录或注册以继续
文章
· 五月 15, 2024 阅读大约需 9 分钟

DNA Similarity and Classification: An Analysis

DNA Similarity and Classification was developed as a REST API utilizing InterSystems Vector Search technology to investigate genetic similarities and efficiently classify DNA sequences. This is an application that utilizes artificial intelligence techniques, such as machine learning, enhanced by vector search capabilities, to classify genetic families and identify known similar DNAs from an unknown input DNA.

K-mer Analysis: Fundamentals in DNA Sequence Analysis

Fragmentation of a DNA sequence into k-mers is a fundamental technique in genetic data processing. This approach involves breaking down the DNA sequence into smaller subsequences of fixed size, known as k-mers. For example, if we take the sequence "ATCGTAGCTA" and define k as 3, we would obtain the following k-mers: "ATC", "TCG", "CGT", "GTA", "TAG", "AGC", "GCT", and "CTA".

This process is carried out by sliding a window of size k along the sequence, extracting each subsequence of size k, and recording it as a k-mer. The window then moves to the next set of bases, allowing overlap between the k-mers. This overlap is crucial to ensure that no information is lost during the fragmentation process.

The resulting k-mers retain crucial local information about the structure of the DNA sequence. This is critical because many biologically relevant pieces of information are contained within specific regions of DNA. Subsequent analysis of these k-mers can reveal patterns, identify conserved or similar regions between different DNA sequences, and assist in the investigation of genetic functions and interactions.

In summary, DNA Similarity and Classification offer an approach to identify genetic similarities and classify DNA sequences.

Application

The following code initializes the project, performing the necessary setup before building:

ClassMethod init()
{
    write "START INIT"
    Do ##class(dc.data.DNAClassification).CreateTheModelML()
    Do ##class(dc.data.HumanDNA).UploadData()
}

The UploadData method implemented in the code is responsible for fragmenting DNA sequences into k-mers, converting these k-mers into vectors using a specific encoding model, and then storing these vectors in the database.

print("Reading dataset")
human_df = pd.read_table('/opt/irisbuild/data/human_data.txt')
human_df = human_df.sample(n=1500, random_state=42)

To enhance performance and reduce processing time during the project's build phase, a sampling process is implemented, limiting the read to the first 1,500 records of the file. While this strategy speeds up the process, it's worth noting that a comprehensive analysis of the file could provide a more thorough understanding of the data. However, due to the commitment to efficiency and the pace of construction, this approach was adopted.

def getKmers(sequence, size=6, max_length=5000):
    kmers = [sequence[x:x+size].lower() for x in range(len(sequence) - size + 1)]
    kmers = [kmer for kmer in kmers if len(kmer) > 0]
    if len(kmers) == 0:
        return [sequence]
    return kmers[:max_length]

The MAX_LEN parameter in getKmers is used to limit the maximum size of the k-mers processed to optimize the runtime of the process. However, it's important to note that this approach may not be ideal in terms of fully representing the DNA data.

 print("Creating K-mers groups")
    human_df['K_mers'] = human_df['sequence'].apply(getKmers)

    print("Combining K-mers into strings")
    human_df['K_mers_str'] = human_df['K_mers'].apply(lambda x: ' '.join(x))

    print("Download stsb-roberta-base-v2 model")
    model = SentenceTransformer('stsb-roberta-base-v2')

    print("Encode K_mers")
    embeddings = model.encode(human_df['K_mers_str'].tolist(), normalize_embeddings=True)

    print("Creating column sequence_vectorized")
    human_df['sequence_vectorized'] = embeddings.tolist()

The code vectorizes DNA sequences using the encoding model stsb-roberta-base-v2 from the SentenceTransformer library. This step converts DNA sequences into numerical vectors, facilitating their vector manipulation and computational analysis. After vectorization, the data is stored in the database for future use. This step is crucial to ensure that DNA sequence vectors are available for queries and additional analyses.

Subsequently, the method findSimilarity is responsible for finding the most similar DNA sequences based on a received DNA sequence input. It fragments the DNA sequence into k-mers and queries the database to find the five closest DNA sequences.

ClassMethod findSimilarity(pSequence As %String) As %String [ Language = python ]
{
    import iris
    from sentence_transformers import SentenceTransformer

    results = []

    def getKmers(sequence, size=6, max_length=5000):
        kmers = [sequence[x:x+size].lower() for x in range(len(sequence) - size + 1)]
        kmers = [kmer for kmer in kmers if len(kmer) > 0]
        if len(kmers) == 0:
            return [sequence]
        return kmers[:max_length]
        

    model = SentenceTransformer('stsb-roberta-base-v2') 
    kmers = getKmers(pSequence)
    kmers_str = ' '.join(kmers)
    search_vector = model.encode(kmers_str, normalize_embeddings=True).tolist()

    stmt = iris.sql.prepare("SELECT TOP 5 ID FROM dc_data.HumanDNA ORDER BY VECTOR_DOT_PRODUCT(kMersVector, TO_VECTOR(?)) DESC ")
    rs = stmt.execute(str(search_vector))

    class_mapping = {
            0: 'G protein coupled receptors',
            1: 'tyrosine kinase',
            2: 'tyrosine phosphatase',
            3: 'synthetase',
            4: 'synthase',
            5: 'lon channel',
            6: 'transcription factor',
    }

    for idx, row in enumerate(rs):
        humanDNA = iris.cls("dc.data.HumanDNA")._OpenId(row[0])    

        results.append({
            "sequence": humanDNA.sequence,
            "dnaClass": class_mapping[humanDNA.dnaClass]
        })
    return results
}

These processes are essential for the project as they enable the comparative analysis of DNA sequences and the identification of patterns or similarities among them.

Applied Machine Learning

Machine learning has been utilized to perform DNA sequence classification. The DNAClassification class contains methods for training a multinomial Naive Bayes classification model and for classifying DNA sequences based on this model.

The CreateTheModelML method is responsible for training the classification model. Firstly, DNA sequences are preprocessed, transformed into k-mers, and grouped. Then, these k-mer groups are converted into a numerical representation using vectorization technique. The multinomial Naive Bayes model is then trained with these vectorized data. After training, the model's performance is evaluated through analyses such as precision, recall, and F1 score.

ClassMethod CreateTheModelML()
{
    import pandas as pd

    def getKmers(sequence, size=6):
        kmers = [sequence[x:x+size].lower() for x in range(len(sequence) - size + 1)]
        kmers = [kmer for kmer in kmers if len(kmer) > 0]
        if len(kmers) == 0:
            return [sequence]
        return kmers

    print("Reading dataset")
    human_df = pd.read_table('/opt/irisbuild/data/human_data.txt')

    print("Creating K-mers groups")
    human_df['K_mers'] = human_df['sequence'].apply(getKmers)
    
    human_df['words'] = human_df.apply(lambda x: getKmers(x['sequence']), axis=1)
    human_df = human_df.drop('sequence', axis=1)

    human_texts = list(human_df['words'])
    for item in range(len(human_texts)):
        human_texts[item] = ' '.join(human_texts[item])
    y_data = human_df.iloc[:, 0].values        
    
    from sklearn.feature_extraction.text import CountVectorizer
    cv = CountVectorizer(ngram_range=(4,4))
    X_human_dna = cv.fit_transform(human_texts)

    print(X_human_dna.shape)

    print("PREPARING FIT DATA")
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X_human_dna, 
                                                        y_data, 
                                                        test_size = 0.20, 
                                                        random_state=42)
    
    print("FIT THE MODEL")
    from sklearn.naive_bayes import MultinomialNB
    classifier = MultinomialNB(alpha=0.1)
    classifier.fit(X_train, y_train)

    y_pred = classifier.predict(X_test)

    print("VALIDATING THE MODEL")

    from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
    print("Confusion matrix\n")
    print(pd.crosstab(pd.Series(y_test, name='Actual'), pd.Series(y_pred, name='Predicted')))
    def get_metrics(y_test, y_predicted):
        accuracy = accuracy_score(y_test, y_predicted)
        precision = precision_score(y_test, y_predicted, average='weighted')
        recall = recall_score(y_test, y_predicted, average='weighted')
        f1 = f1_score(y_test, y_predicted, average='weighted')
        return accuracy, precision, recall, f1
    accuracy, precision, recall, f1 = get_metrics(y_test, y_pred)
    print("accuracy = %.3f \nprecision = %.3f \nrecall = %.3f \nf1 = %.3f" % (accuracy, precision, recall, f1))


    print("SAVE THE MODEL")
    import joblib
    joblib.dump(classifier, '/opt/irisbuild/data/multinomial_nb_model.pkl')
    joblib.dump(cv, '/opt/irisbuild/data/cv_to_multinomial_nb_model.pkl')
}

On the other hand, the ClassifyDnaSequence method is responsible for classifying the received DNA sequences as input. Firstly, the sequence is preprocessed and transformed into k-mers. Then, these k-mers are vectorized using the same process used during the model training. The trained model is then loaded and used to predict the class of the DNA sequence. The classification result includes the predicted class and the probabilities associated with each class.

ClassMethod ClassifyDnaSequence(sequence As %String) As %Status
{
    import joblib
    from sklearn.feature_extraction.text import CountVectorizer

    classifier = joblib.load('/opt/irisbuild/data/multinomial_nb_model.pkl')
    cv = joblib.load('/opt/irisbuild/data/cv_to_multinomial_nb_model.pkl')

    def getKmers(sequence, size=6):
        kmers = [sequence[x:x+size].lower() for x in range(len(sequence) - size + 1)]
        kmers = [kmer for kmer in kmers if len(kmer) > 0]
        if len(kmers) == 0:
            return [sequence]
        return kmers
    k_mers = getKmers(sequence)

    k_mers_vec = cv.transform([' '.join(k_mers)])

    predicted_class = classifier.predict(k_mers_vec)[0]
    probabilities = classifier.predict_proba(k_mers_vec)[0]

    class_mapping = {
            0: 'G protein coupled receptors',
            1: 'tyrosine kinase',
            2: 'tyrosine phosphatase',
            3: 'synthetase',
            4: 'synthase',
            5: 'lon channel',
            6: 'transcription factor',
    }
    

    result = {
        "Classification": class_mapping[predicted_class],
        "Probabilities": {
            class_mapping[class_index]: float(probability)
            for class_index, probability in enumerate(probabilities)
        }
    }
    
    return result
}

Usage of the Application

Users interact with the system by sending DNA sequences through queries and can request a JSON-formatted response containing information about the discovered similar sequences and their classifications, simplifying the process of genetic analysis.

Below is the API to be used: http://localhost:52773/api/dna/find?&dna=<YOUR_DNA>

  • dna: Enter the DNA sequence you want to search

Example: http://localhost:52773/api/dna/find?&dna=ATGAACTGTCCAGCCCCTGTGGAGATCTCCT...

Using Postman to visualize the API response (can be done using a browser or other applications like Insomnia):

Project: GitHub

Voting: DNA-similarity-and-classify

2 Comments
讨论 (2)2
登录或注册以继续
InterSystems 官方
· 五月 15, 2024

InterSystems announces InterSystems IRIS 2024.1.0.267.2, includes support for Ubuntu 24.04

InterSystems is pleased to announce the general availability of:

  • InterSystems IRIS Data Platform 2024.1.0.267.2
  • InterSystems IRIS for Health 2024.1.0.267.2
  • HealthShare Health Connect 2024.1.0.267.2

This release adds support for the Ubuntu 24.04 operating system.  Ubuntu 24.04 includes Linux kernel 6.8, security improvements, along with installer and user interface improvements.  InterSystems IRIS IntegratedML is not yet available on Ubuntu 24.04.

Additionally, this release addresses two defects for all platforms:

  • A fix for some SQL queries using “NOT %INLIST” returning incorrect results. We previously issued an alert on this error.
  • A fix for incomplete stack traces in certain circumstances. 

How to get the software

As usual, Extended Maintenance (EM) releases come with classic installation packages for all supported platforms, as well as container images in Docker container format. For a complete list, refer to the Supported Platforms page.

Classic installation packages

Installation packages are available from the WRC's Extended Maintenance Releases page. Additionally, kits can also be found in the Evaluation Services website.

Containers

Container images for both Enterprise and Community Editions of InterSystems IRIS and IRIS for Health and all corresponding components are available from the InterSystems Container Registry web interface.

Containers are tagged as both "2024.1" or "latest-em".

讨论 (0)1
登录或注册以继续
问题
· 五月 15, 2024

End of File handling in Cache and IRIS

In Cache End of file throws error but in IRIS no indication of End of file. I have to do an explicit $ZOF. How are you handling/detecting End of File in IRIS?

In cache this line will throw End of file error -  F PREC=1:1 U FILE R REC  D SOMETHING

But in IRIS this goes to forever, has anyone noticed this behaviour in IRIS?

2 Comments
讨论 (2)2
登录或注册以继续
问题
· 五月 15, 2024

Modification d'un objet dynamique provenant d'un JSON

Bonjour, 

Je souhaite réaliser une méthode générique pour modifier des propriétés d'une classe dynamique. Lorsque je fais un JSONImport() cela fonctionne très bien pour certains objet, or dans le cas d'un objet contenant une liste cela m'ajoute un élément en plus au lieu de la modifier. J'ai essayée de vérifier le type lors de l'itération du JSON afin de faire un Insert mais je n'arrive pas à utiliser les $METHOD sur le Insert ou même le %Set. 

Voici la classe : 

Class Epc.conf.poste Extends (%Persistent, %JSON.Adaptor)
{

Property NomPoste As %String(%JSONFIELDNAME = "NomPoste", MAXLEN = "");
Property tPrinter As list Of EpErp.conf.printer(%JSONFIELDNAME = "tPrinter", SQLPROJECTION = "table/column", STORAGEDEFAULT = "array");
}
Class EpErp.conf.printer Extends (%SerialObject, %JSON.Adaptor)
{

Property Service As %String(%JSONFIELDNAME = "NomService", MAXLEN = "");
Property Print As %String(%JSONFIELDNAME = "NomImprimante", MAXLEN = "");
}

Mes données actuelles : 

Le fichier que j'envoie afin de modifier ma liste, je souhaite donc modifier un des éléments de ma list  : 

{
	"NomPoste":"Poste",
	"tPrinter":
	[
		{
			"NomService":"Petite",
			"NomImprimante":"EPFR_12"
		},
		{
			"NomService":"Grande",
			"NomImprimante":"Modif"
		}
	]
}

Ma méthode de modification générique : 

ClassMethod update(){
    
set payload={}.%FromJSON(%request.Content)

 set itr           = payload.%GetIterator()  

 if ##class(%Dictionary.ClassDefinition).%ExistsId(className){
  set class = $CLASSMETHOD(className,"%OpenId",id)
   if $ISOBJECT(class){
       if payload '= ""{
          while itr.%GetNext(.prop,.val) {
              if prop.%IsA("%DynamicArray"){
                    ?                 
                }else{
                    Set $PROPERTY(class, prop) = val
                }
               
                  do class.%Save()
                 }
             }
        }
     }
}

Merci par avance pour votre aide. 

2 Comments
讨论 (2)2
登录或注册以继续