Demonstration example for the current Grand Prix contest for use of a more complex Parameter template to test the AI.
Interview Questions
There is documentation. A recruitment consultant wants to quickly challenge candidates with some relevant technical questions to a role.
Can they automate making a list of questions and answers from the available documentation?
Interview Answers and Learning
One of the most effective ways to cement new facts into accessible long term memory is with phased recall.
In essence you take a block of text information, reorganize it into a series of self-contained Questions and Facts.
Now imagine two questions:
- What day of the week is the trash-bin placed outside for collection?
- When is the marriage anniversary?
Quickly recalling correct answers can mean a happier life!!
Recalling the answer to each question IS the mechanism to enforce a fact into memory.
Phased Recall re-asks each question with longed and longer time gaps when the correct answer is recalled.
For example:
- You consistently get the right answer: The question is asked again tomorrow, in 4 days, in 1 week, in 2 weeks, in 1 month.
- You consistently get the answer wrong: The question will be asked every day until it starts to be recalled.
If you can easily see challenging answers, it is productive to re-work difficult answers, to make them more memorable.
There is a free software package called Anki that provides this full phased recall process for you.
If you can automate the creation of questions and answers into a text file, the Anki will create new flashcards for you.
Hypothesis
We can use LangChain to transform InterSystems PDF documentation into a series of Questions and answers to:
- Make interview questions and answers
- Make Learner Anki flash cards
Create new virtual environment
mkdir chainpdf cd chainpdf python -m venv . scripts\activate pip install openai pip install langchain pip install wget pip install lancedb pip install tiktoken pip install pypdf set OPENAI_API_KEY=[ Your OpenAI Key ] python
Prepare the docs
import glob import wget; url='https://docs.intersystems.com/irisforhealth20231/csp/docbook/pdfs.zip'; wget.download(url) # extract docs import zipfile with zipfile.ZipFile('pdfs.zip','r') as zip_ref: zip_ref.extractall('.')
Extract PDF text
from langchain.document_loaders import PyPDFLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.prompts.prompt import PromptTemplate from langchain import OpenAI from langchain.chains import LLMChain # To limit for the example # From the documentation site I could see that documentation sets # GCOS = Using ObjectScript # RCOS = ObjectScript Reference pdfFiles=['./pdfs/pdfs/GCOS.pdf','./pdfs/pdfs/RCOS.pdf'] # The prompt will be really big and need to leave space for the answer to be constructed # Therefore reduce the input string text_splitter = CharacterTextSplitter( separator = "\n\n", chunk_size = 200, chunk_overlap = 50, length_function = len, ) # split document text into chuncks documentsAll=[] for file_name in pdfFiles: loader = PyPDFLoader(file_name) pages = loader.load_and_split() # Strip unwanted padding for page in pages: del page.lc_kwargs page.page_content=("".join((page.page_content.split('\xa0')))) documents = text_splitter.split_documents(pages) # Ignore the cover pages for document in documents[2:]: # skip table of contents if document.page_content.__contains__('........'): continue documentsAll.append(document)
Prep search template
_GetDocWords_TEMPLATE = """From the following documents create a list of distinct facts. For each fact create a concise question that is answered by the fact. Do NOT restate the fact in the question. Output format: Each question and fact should be output on a seperate line delimited by a comma character Escape every double quote character in a question with two double quotes Add a double quote to the beginning and end of each question Escape every double quote character in a fact with two double quotes Add a double quote to the beginning and end of each fact Each line should end with {labels} The documents to reference to create facts and questions are as follows: {docs} """ PROMPT = PromptTemplate( input_variables=["docs","labels"], template=_GetDocWords_TEMPLATE ) llm = OpenAI(temperature=0, verbose=True) chain = LLMChain(llm=llm, prompt=PROMPT)
Process each document and place output in file
# open an output file with open('QandA.txt','w') as file: # iterate over each text chunck for document in documentsAll: # set the label for Anki flashcard source=document.metadata['source'] if source.__contains__('GCOS.pdf'): label='Using ObjectScript' else: label='ObjectScript Reference' output=chain.run(docs=document,labels=label) file.write(output+'\n') file.flush()
There were some retry and force-close messages during loop.
Anticipate this is limiting the openAI API to a fair use.
Alternatively a local LLM could be applied instead.
Examine the output file
While a good attempt at formatting answers has occurred there is some deviation.
Manually reviewing I can pick some questions and answers to continue the experiment.
Importing FlashCards into Anki
Reviewed text file:
"What is a global?", "A global is a sparse, multidimensional database array.", "Using ObjectScript",
"What is the effect of the ##; comment on INT code line numbering?", "It does not change INT code line numbering.", "Using ObjectScript",
"What characters can be used in an explicit namespace name after the first character?", "letters, numbers, hyphens, or underscores", "Using ObjectScript"
"Are string equality comparisons case-sensitive?", "Yes", "Using ObjectScript",
"What happens when the number of references to an object reaches 0?", "The system automatically destroys the object.","Using ObjectScript"
"What operations can take an undefined or defined variable?", "The READ command, the $INCREMENT function, the $BIT function, and the two-argument form of the $GET function.", "Using ObjectScript"
Creating new Anki card deck
Open Anki and select File -> Import
Select the reviewed text file
Optionally create a new Card Deck for "Object Script"
A basic card type is fine for this format
There was mention of a "Field 4" so should check the records.
Anki import success
Lets Study
Now choose the reinforcement schedule
Happy Learning !!
References
Anki software is available from https://apps.ankiweb.net/