AI, Blog

Prompt-Compression-using-LLMLingua

In this below program we will try to Learn about the LLMLingua Capabilities

!pip install llmlingua llama-index
# Using the OAI
import openai
openai.api_key = "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"
!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
    StorageContext,
)

# load documents
documents = SimpleDirectoryReader(
    input_files=["paul_graham_essay.txt"]
).load_data()
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k=10)
question = "Where did the author go for art school?"
answer = "RISD"
contexts = retriever.retrieve(question)
context_list = [n.get_content() for n in contexts]
len(context_list)
#10
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-16k")
prompt = "\n\n".join(context_list + [question])

response = llm.complete(prompt)
print(str(response))
The author went to the Rhode Island School of Design (RISD) for art school.

Setup LinguaLLM

# Setup LLMLingua
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response_synthesizers import CompactAndRefine
from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor

node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=300,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",  # enable document reorder,
        "dynamic_context_compression_ratio": 0.3,
    },
)
retrieved_nodes = retriever.retrieve(question)
synthesizer = CompactAndRefine()
from llama_index.indices.query.schema import QueryBundle

# outline steps in RetrieverQueryEngine for clarity:
# postprocess (compress), synthesize
new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=question)
)
original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes])
compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes])

original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)
compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)

print(compressed_contexts)
print()
print("Original Tokens:", original_tokens)
print("Compressed Tokens:", compressed_tokens)
print("Compressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")
What should I do next? Rtm's advice hadn't included anything about that. I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I could get if I focused on it. So the day after I stopped working on YC, I started painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging. [18]

Our Ulivi, was a guy. He could see I worked hard, and gave me, wrote down in a sort of pass each student But Accademia wasn't me anything Italian, and my money was running out, so at the end of the first year I back to US

I wanted back to RISD, but I was now broke and RISD very expensive decided to a job for year return RISD the I got one at called Interleaf, which made software. You Microsoft Word? Exactly That was learned end software tends to high. But Interleaf still had a few years to live. [] in ID, but was basically myself to I for free99 I out around my friend Nancy Parmet did big Aled building in York becomingant. Did I It wasn my place was be where the. wanted it! [7]

Original Tokens: 10703
Compressed Tokens: 275
Comressed Ratio: 38.92x
response = synthesizer.synthesize(question, new_retrieved_nodes)
print(str(response))
#output:  The author went to RISD for art school.
retriever_query_engine = RetrieverQueryEngine.from_args(
    retriever, node_postprocessors=[node_postprocessor]
)
response = retriever_query_engine.query(question)
print(str(response))
#The author went to RISD for art school.

We can see that “Original Tokens: 10703 “ were compressed to 275
Compressed Ratio: 38.92x but still the output were same 

All program in one time:

!pip install llmlingua llama-index

# Using the OAI
import openai
openai.api_key = "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"

!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt

from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
    StorageContext,
)

# load documents
documents = SimpleDirectoryReader(
    input_files=["paul_graham_essay.txt"]
).load_data()

index = VectorStoreIndex.from_documents(documents)

retriever = index.as_retriever(similarity_top_k=10)

question = "Where did the author go for art school?"

answer = "RISD"

contexts = retriever.retrieve(question)

context_list = [n.get_content() for n in contexts]
len(context_list)

from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-16k")
prompt = "\n\n".join(context_list + [question])

response = llm.complete(prompt)
print(str(response))

# Setup LLMLingua
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response_synthesizers import CompactAndRefine
from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor

node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=300,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",  # enable document reorder,
        "dynamic_context_compression_ratio": 0.3,
    },
)

retrieved_nodes = retriever.retrieve(question)
synthesizer = CompactAndRefine()

from llama_index.indices.query.schema import QueryBundle

# outline steps in RetrieverQueryEngine for clarity:
# postprocess (compress), synthesize
new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=question)
)

original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes])
compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes])

original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts)
compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)

print(compressed_contexts)
print()
print("Original Tokens:", original_tokens)
print("Compressed Tokens:", compressed_tokens)
print("Compressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")

response = synthesizer.synthesize(question, new_retrieved_nodes)

print(str(response))

retriever_query_engine = RetrieverQueryEngine.from_args(
    retriever, node_postprocessors=[node_postprocessor]
)

response = retriever_query_engine.query(question)
print(str(response))

1 Comment

  1. Elon mark

    Nivce article..its informative and interactive. Good luck. Bonzai!

Leave a Reply