Small language models

技术雷达

更新于 : Nov 05, 2025

Nov 2025

评估

We’ve observed steady progress in the development of small language models (SLMs) across several volumes of the Technology Radar. With growing interest in building agentic solutions, we’re seeing increasing evidence that SLMs can power agentic AI efficiently. Most current agentic workflows are focused on narrow, repetitive tasks that don’t require advanced reasoning, making them a good match for SLMs. Continued advancements in SLMs such as Phi-3, SmolLM2 and DeepSeek suggest that SLMs offer sufficient capability for these tasks — with the added benefits of lower cost, reduced latency and lower resource consumption compared to LLMs. It’s worth considering SLMs as the default choice for agentic workflows, reserving larger, more resource-intensive LLMs only when necessary.

Apr 2025

试验

最近发布的 DeepSeek R1 充分展示了 小语言模型(SLMs） 为何仍然备受关注。满血版 R1 拥有 6710 亿个参数，并且需要约 1342GB 的 VRAM 才能运行，这通常只能通过八块最先进的 NVIDIA GPU 组成的“迷你集群”来实现。然而，DeepSeek 也提供了“蒸馏版”，即 Qwen 和 Llama 等更小的开放权重模型，使其能力得以迁移，并能够在更普通的硬件上运行。尽管这些小型版本在性能上有所折损，但相较于以往的小语言模型，依然实现了巨大的性能飞跃。小语言模型领域仍在不断创新。自上次技术雷达以来，Meta 推出了 Llama 3.2，涵盖 10 亿和 30 亿参数规模；微软发布了 Phi-4，其 140 亿参数模型在质量上表现出色；谷歌则推出了 PaliGemma 2，一个支持视觉-语言任务的模型，提供 30 亿、100 亿和 280 亿参数版本。这些只是近期发布的小型模型中的一部分，但无疑表明了这一趋势仍值得持续关注。

Oct 2024

试验

大语言模型（LLMs）在许多应用领域中被证明是有用的，但它们的体积庞大可能会带来一些问题：响应一个提示需要大量计算资源，导致查询速度慢且成本高；这些模型是专有的，体积庞大，必须由第三方托管在云中，这可能对敏感数据造成问题；而且，在大多数情况下，训练一个模型的费用是非常高的。最后一个问题可以通过RAG 模式来解决，该模式绕过了训练和微调基础模型的需求，但成本和隐私问题往往依然存在。为此，我们现在看到对 小语言模型(SLMs) 的兴趣日益增长。与更流行的 LLMs 相比，SLMs 的参数更少、精度较低，通常在 35 亿到 100 亿个参数之间。最近的研究表明，在适当的上下文中，正确设置时，SLMs 可以执行甚至超越 LLMs。它们的体积也使得在端侧设备上运行成为可能。我们之前提到过谷歌的 Gemini Nano，但随着微软推出其Phi-3系列，该领域正在迅速发展。

发布于 : Oct 23, 2024