发布新帖

検索

文章
· 六月 13, 2023 阅读大约需 3 分钟

LangChain on InterSystems PDF documentation

Yet another example of applying LangChain to give some inspiration for new community Grand Prix contest.

I was initially looking to build a chain to achieve dynamic search of html of documentation site, but in the end it was simpler to borg the static PDFs instead.

Create new virtual environment

mkdir chainpdf

cd chainpdf

python -m venv .

scripts\activate 

pip install openai
pip install langchain
pip install wget
pip install lancedb
pip install tiktoken
pip install pypdf

set OPENAI_API_KEY=[ Your OpenAI Key ]

python

Prepare the docs

import glob
import wget;

url='https://docs.intersystems.com/irisforhealth20231/csp/docbook/pdfs.zip';
wget.download(url)
# extract docs
import zipfile
with zipfile.ZipFile('pdfs.zip','r') as zip_ref:
  zip_ref.extractall('.')

# get a list of files
pdfFiles=[file for file in glob.glob("./pdfs/pdfs/*")]

Load docs into Vector Store

import lancedb
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import LanceDB
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.prompts.prompt import PromptTemplate
from langchain import OpenAI
from langchain.chains import LLMChain


embeddings = OpenAIEmbeddings()
db = lancedb.connect('lancedb')
table = db.create_table("my_table", data=[
    {"vector": embeddings.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")

documentsAll=[]
pdfFiles=[file for file in glob.glob("./pdfs/pdfs/*")]
for file_name in pdfFiles:
  loader = PyPDFLoader(file_name)
  pages = loader.load_and_split()
  # Strip unwanted padding
  for page in pages:
    del page.lc_kwargs
    page.page_content=("".join((page.page_content.split('\xa0'))))
  documents = CharacterTextSplitter().split_documents(pages)
  # Ignore the cover pages
  for document in documents[2:]:
    documentsAll.append(document)

# This will take couple of minutes to complete
docsearch = LanceDB.from_documents(documentsAll, embeddings, connection=table)

Prep the search template

_GetDocWords_TEMPLATE = """Answer the Question: {question}

By considering the following documents:
{docs}
"""

PROMPT = PromptTemplate(
     input_variables=["docs","question"], template=_GetDocWords_TEMPLATE
)

llm = OpenAI(temperature=0, verbose=True)

chain = LLMChain(llm=llm, prompt=PROMPT)

Are you sitting down... Lets talk with the documentation

"What is a File adapter?"

# Ask the queston
# First query the vector store for matching content
query = "What is a File adapter"
docs = docsearch.similarity_search(query)
# Only using the first two documents to reduce token search size on openai
chain.run(docs=docs[:2],question=query)

Answer:

'\nA file adapter is a type of software that enables the transfer of data between two different systems. It is typically used to move data from one system to another, such as from a database to a file system, or from a file system to a database. It can also be used to move data between different types of systems, such as from a web server to a database.

"What is a lock table?"  

# Ask the queston # First query the vector store for matching content
query = "What is a locak table"
docs = docsearch.similarity_search(query)
# Only using the first two documents to reduce token search size on openai
chain.run(docs=docs[:2],question=query)

Answer:

'\nA lock table is a system-wide, in-memory table maintained by InterSystems IRIS that records all current locks and the processes that have owned them. It is accessible via the Management Portal, where you can view the locks and (in rare cases, if needed) remove them.'

 

Will leave as a future exercise to format an User interface on this functionality

4 Comments
讨论 (4)3
登录或注册以继续
InterSystems 官方
· 六月 13, 2023

June 13, 2023 - Advisory: Increased Process Memory Usage

InterSystems has corrected a defect that causes increased process memory usage.

Specifically, the increased consumption of local process partition memory occurs when executing $Order,  $Query, or Merge on local variables.  While this will have no detrimental impact for most running environments, environments that support a large number of processes or closely limit Maximum Per-Process Memory could be impacted.  Some processes may experience <STORE> errors.

The defect exists in 2023.1.0.229.0 but it is reposted as 2023.1.0.235.1 with the fixes included, to expedite the correction without clients needing to wait for a maintenance release.

The corrections for this defect are identified as DP-423127 and DP-423237.  These will be included in all future versions.

The defect appears in versions 2022.2, 2022.3, and 2023.1 (build 229) of InterSystems IRIS®, InterSystems IRIS for Health™, and HealthShare® Health Connect.  If you are running one of these versions, we suggest upgrading to 2023.1 (build 235).

This correction is also available via Ad hoc distribution.

If you have any questions regarding this alert, please contact the Worldwide Response Center.

讨论 (0)2
登录或注册以继续
文章
· 六月 12, 2023 阅读大约需 3 分钟

LangChain fixed the SQL for me

This article is a simple quick starter (what I did was) with SqlDatabaseChain.

Hope this ignites some interest.

Many thanks to:

sqlalchemy-iris author @Dmitry Maslennikov

Your project made this possible today.

 

The article script uses openai API so caution not to share table information and records externally, that you didn't intend to.

A local model could be plugged in , instead if needed.

 

Creating a new virtual environment

mkdir chainsql

cd chainsql

python -m venv .

scripts\activate

pip install langchain

pip install wget

# Need to connect to IRIS so installing a fresh python driver
python -c "import wget;url='https://raw.githubusercontent.com/intersystems-community/iris-driver-distribution/main/DB-API/intersystems_irispython-3.2.0-py3-none-any.whl';wget.download(url)"

# And for more magic
pip install sqlalchemy-iris

pip install openai

set OPENAI_API_KEY=[ Your OpenAI Key ]

python

 

Initial Test

from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

db = SQLDatabase.from_uri("iris://superuser:******@localhost:51775/USER")

llm = OpenAI(temperature=0, verbose=True)

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)

db_chain.run("How many Tables are there")

Result error

sqlalchemy.exc.DatabaseError: (intersystems_iris.dbapi._DBAPI.DatabaseError) [SQLCODE: <-25>:<Input encountered after end of query>]
[Location: <Prepare>]
[%msg: < Input (;) encountered after end of query^SELECT COUNT ( * ) FROM information_schema . tables WHERE table_schema = :%qpar(1) ;>]
[SQL: SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';]
(Background on this error at: https://sqlalche.me/e/20/4xp6)
←[32;1m←[1;3mSELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';←[0m>>>

Inter-developer dialogue

IRIS didn't like being given SQL queries that end with a semicolon.

What to do now? ?

Idea: How about I tell LangChain to fix it for me

Cool. Lets do this !!

 

Test Two

from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

from langchain.prompts.prompt import PromptTemplate

_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.

Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

The SQL query should NOT end with semi-colon
Question: {input}"""

PROMPT = PromptTemplate(
     input_variables=["input", "dialect"], template=_DEFAULT_TEMPLATE
)

db = SQLDatabase.from_uri("iris://superuser:******@localhost:51775/USER") llm = OpenAI(temperature=0, verbose=True)

llm = OpenAI(temperature=0, verbose=True)

db_chain = SQLDatabaseChain(llm=llm, database=db, prompt=PROMPT, verbose=True) 

db_chain.run("How many Tables are there")

 

Result Two

SQLQuery:←[32;1m←[1;3mSELECT COUNT(*) FROM information_schema.tables←[0m
SQLResult: ←[33;1m←[1;3m[(499,)]←[0m
Answer:←[32;1m←[1;3mThere are 499 tables.←[0m
←[1m> Finished chain.←[0m
'There are 499 tables.'

I said it would be quick.

 

References:

https://walkingtree.tech/natural-language-to-query-your-sql-database-usi...

https://python.langchain.com/en/latest/modules/chains/examples/sqlite.ht...

https://python.langchain.com/en/latest/modules/agents/plan_and_execute.html

7 Comments
讨论 (7)3
登录或注册以继续
问题
· 六月 12, 2023

Folks, did Anyone performed DB or Backend Automation on IRIS Dataplatform?

 DB or Backend Automation on IRIS Dataplatform performed?

3 Comments
讨论 (3)1
登录或注册以继续
问题
· 六月 12, 2023

Performed API Automation on IRIS Dataplatform?

API performed API Automation on IRIS Dataplatform?

4 Comments
讨论 (4)1
登录或注册以继续