In 2023, GitHub hosted 1.8 million AI projects compared to approximately 650,000 in 2020. AI applications are rapidly increasing due to model improvements and API use. APIs allow developers to easily use LLMs in their applications allowing for fast development and deployment. However, high cost is one concern with API calls to LLMs. AI service providers usually charge users for API use based on the number of processed tokens. Money goes down the drain when AI applications send frequent API calls with similar content. Prompt caching or context caching solves this issue by caching context determined by the developer. This reduces the number of tokens sent via API calls. AI engineers can turn to prompt caching to decrease inference fees and reduce latency in their AI applications.
How Context Was Handled Before Prompt Caching
LLMs from major AI service providers are usually stateful …