AutoRound

Technology Radar

Published : Nov 05, 2025

Nov 2025

Assess

Intel's AutoRound is an advanced quantization algorithm for compressing large AI models, such as LLMs and vision language models (VLMs), with minimal loss of accuracy. It reduces model size to ultra-low bit widths (2–4 bits) using sign-gradient descent optimization and applies mixed bit widths across layers for optimal efficiency. This quantization process is also remarkably fast: You can quantize a 7-billion-parameter model in just minutes on a single GPU. Since AutoRound integrates with popular inference engines such as vLLM and Transformers, it's an attractive option for quantizing models.

Download the PDF

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

Subscribe now

Solutions

Industries

Publications and Tools

All Insights

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read previous volumes