Technology Radar

Docling

Last updated : Apr 15, 2026

Apr 2026

Trial

Docling is an open-source Python and TypeScript library for converting unstructured documents into clean, machine-readable outputs. Using a computer vision–based approach to layout and semantic understanding, it processes complex inputs — including PDFs and scanned documents — into structured formats such as JSON and Markdown. That makes it a strong fit for retrieval-augmented generation (RAG) pipelines and for producing structured outputs from LLMs, in contrast to vision-first retrieval approaches such as ColPali.

Docling provides an open-source, self-hostable alternative to proprietary cloud-managed services such as Azure Document Intelligence, Amazon Textract and Google Document AI, while integrating well with frameworks such as LangGraph. In our experience, it performs well in production-scale extraction workloads across digital and scanned PDFs, including very large files containing text, tables and images. It delivers a strong quality-to-cost balance for downstream agentic RAG workflows. Based on these results, we’re moving Docling to Trial.

Nov 2025

Assess

Docling 是一个开源的 Python 和 TypeScript 库，用于对非结构化数据进行高级文档处理。它解决了常被忽视的“最后一公里”问题，即将真实世界的文档——如 PDF 和 PowerPoint——转换为干净、可机器读取的格式。与传统提取器不同，Docling 使用基于计算机视觉的方法来解析文档的布局和语义结构，使其输出对于增强检索生成（RAG）流水线特别有价值。它可将复杂文档转换为结构化格式，如 JSON 或 Markdown，并支持 LLM 的结构化输出等技术。这与 ColPali 不同，后者直接将页面图像输入视觉-语言模型以进行检索。 Docling 的开源特性和基于 Python 的核心（建立在自定义的 Pydantic 数据模型上）为团队提供了灵活的自托管替代方案，相比于 Azure 文档智能、Amazon Textract 和 Google Document AI 等专有云工具更具自主性。该项目由 IBM Research 支持，开发快速，并提供可即插即用的架构，可与 LangGraph 等其他框架集成，非常值得构建生产级 AI 数据管道的团队进行评估。

Published : Nov 05, 2025

Download the PDF

English | Português

Sign up for the Technology Radar newsletter

Subscribe now

行业

数字出版物和工具

所有洞见

Docling

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read previous volumes