Using GPTCache with LiteLLM
GPTCache is a Library for Creating Semantic Cache for LLM Queries
GPTCache Docs: https://gptcache.readthedocs.io/en/latest/index.html#
GPTCache Github: https://github.com/zilliztech/GPTCache
In this document we cover:
- Quick Start Usage
- Advanced Usage - Set Custom Cache Keys
Quick Start Usage​
👉 Jump to Colab Notebook Example
Install GPTCache​
pip install gptcache
Using GPT Cache with Litellm Completion()​
Using GPTCache​
In order to use GPTCache the following lines are used to instantiate it
from gptcache import cache
# set API keys in .env / os.environ
cache.init()
cache.set_openai_key()
Full Code using GPTCache and LiteLLM​
By default GPT Cache uses the content in messages as the cache key
from gptcache import cache
from litellm.gpt_cache import completion # import completion from litellm.cache
import time
# Set your .env keys 
os.environ['OPENAI_API_KEY'] = ""
cache.init()
cache.set_openai_key()
question = "what's LiteLLM"
for _ in range(2):
    start_time = time.time()
    response = completion(
      model='gpt-3.5-turbo',
      messages=[
        {
            'role': 'user',
            'content': question
        }
      ],
    )
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
Advanced Usage - Set Custom Cache Keys​
By default gptcache uses the messages as the cache key
GPTCache allows you to set custom cache keys by setting
cache.init(pre_func=pre_cache_func)
In this code snippet below we define a pre_func that returns message content + model as key 
Defining a pre_func for GPTCache​
### using / setting up gpt cache
from gptcache import cache
from gptcache.processor.pre import last_content_without_prompt
from typing import Dict, Any
# use this function to set your cache keys -> gptcache
# data are all the args passed to your completion call 
def pre_cache_func(data: Dict[str, Any], **params: Dict[str, Any]) -> Any:
        # use this to set cache key
        print("in pre_cache_func")
        last_content_without_prompt_val = last_content_without_prompt(data, **params)
        print("last content without prompt", last_content_without_prompt_val)
        print("model", data["model"])
        cache_key = last_content_without_prompt_val + data["model"]
        print("cache_key", cache_key)
        return cache_key # using this as cache_key
        
Init Cache with pre_func to set custom keys​
# init GPT Cache with custom pre_func
cache.init(pre_func=pre_cache_func)
cache.set_openai_key()
Using Cache​
- Cache key is message+model
We make 3 LLM API calls
- 2 to OpenAI
- 1 to Cohere command nightly
messages = [{"role": "user", "content": "why should I use LiteLLM for completions()"}]
response1 = completion(model="gpt-3.5-turbo", messages=messages)
response2 = completion(model="gpt-3.5-turbo", messages=messages)
response3 = completion(model="command-nightly", messages=messages) # calling cohere command nightly
if response1["choices"] != response2["choices"]: # same models should cache 
    print(f"Error occurred: Caching for same model+prompt failed")
if response3["choices"] == response2["choices"]: # different models, don't cache 
    # if models are different, it should not return cached response
    print(f"Error occurred: Caching for different model+prompt failed")
print("response1", response1)
print("response2", response2)
print("response3", response3)