Just Cache It (Part 1): Maintaining Context with APIs and LLMs

Mon., 20. January 2025 | 5 min read

View All Articles

In 2023, GitHub hosted 1.8 million AI projects compared to approximately 650,000 in 2020. AI applications are rapidly increasing due to model improvements and API use. APIs allow developers to easily use LLMs in their applications allowing for fast development and deployment. However, high cost is one concern with API calls to LLMs. AI service providers usually charge users for API use based on the number of processed tokens. Money goes down the drain when AI applications send frequent API calls with similar content. Prompt caching or context caching solves this issue by caching context determined by the developer. This reduces the number of tokens sent via API calls. AI engineers can turn to prompt caching to decrease inference fees and reduce latency in their AI applications.

How Context Was Handled Before Prompt Caching

LLMs from major AI service providers are usually stateful …

Tactive Research Group Subscription

To access the complete article, you must be a member. Become a member to get exclusive access to the latest insights, survey invitations, and tailored marketing communications. Stay ahead with us.

Become a Client!

Just Cache It (Part 1): Maintaining Context with APIs and LLMs

How Context Was Handled Before Prompt Caching

Tactive Research Group Subscription

Change Your IT Strategy to Benefit from the Digital Markets Act and Empower Growth

Understanding the Impact of the AI Act on SMEs in the European Union

Revolutionizing Laptops: The Rise of Full Modularity and Its Benefits