Prompt caching (context caching) and Retrieval-Augmented Generation (RAG) are two cheaper methods that add context to LLMs. Google, Anthropic, and OpenAI announced prompt caching for their models in June and August 2024, respectively. These announcements have led many to ask if prompt caching has killed RAG. This is not the case, RAG is still alive and well. AI engineers should read this article to learn about the different use cases of prompt caching and RAG.
Prompt Caching vs RAG
Prompt caching stores content in a cache that is used as context for user prompts, while RAG uses vector databases to store information. When a user prompt is sent, RAG retrieves relevant information from the vector database and then sends this information as context along with the user prompt to the LLM. Tables 1 and 2 show the advantages and …