Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Nov 05, 2025
Nov 2025
Assess ?

Intel's AutoRound is an advanced quantization algorithm for compressing large AI models, such as LLMs and vision language models (VLMs), with minimal loss of accuracy. It reduces model size to ultra-low bit widths (2–4 bits) using sign-gradient descent optimization and applies mixed bit widths across layers for optimal efficiency. This quantization process is also remarkably fast: You can quantize a 7-billion-parameter model in just minutes on a single GPU. Since AutoRound integrates with popular inference engines such as vLLM and Transformers, it's an attractive option for quantizing models.

Download the PDF

 

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes