Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Nov 05, 2025
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Nov 2025
Assess ?

LMCache 是一个用于加速大语言模型(LLM)服务基础设施的键值对(KV)缓存解决方案。它作为跨 LLM 推理引擎池的专门缓存层,存储可能被多次处理的文本的预计算 KV 缓存条目,如聊天历史或文档集合。通过将这些值持久化到磁盘上,预填充计算可以从 GPU 卸载,减少首令牌时间(TTFT)并降低 RAG 管道、多轮聊天应用和智能体系统等高要求工作负载的推理成本。你可以将 LMCache 与主要推理服务器(如 vLLM 或 NVIDIA Dynamo 集成,并认为值得评估其对你的设置的影响。

Download the PDF

 

 

 

English | Português 

Sign up for the Technology Radar newsletter

 

 

Subscribe now

Visit our archive to read previous volumes