Technology Radar
Apache Iceberg is an open table format for large-scale analytical datasets that defines how data files, metadata and schemas are organized on storage systems such as S3. Having evolved significantly in recent years, it has become a foundational building block for technology-agnostic lakehouse architectures.
Iceberg is now supported by all major data platform providers — including AWS (Athena, EMR, Redshift), Snowflake, Databricks and Google BigQuery — making it a strong option for avoiding vendor lock-in. What distinguishes Iceberg from other open table formats is its openness across features and governance, unlike alternatives whose capabilities are limited or controlled by a single vendor.
From a reliability perspective, Iceberg's snapshot-based design provides serializable isolation, safe concurrent writes through optimistic concurrency and version history with rollback. These capabilities deliver strong correctness guarantees while avoiding performance bottlenecks.
While Apache Spark remains the most common engine used with Iceberg, it’s also well supported by Trino, Flink, DuckDB and others, making it suitable for a wide range of use cases, from enterprise data platforms to lightweight local analytics. Across many of our teams, Iceberg has earned strong trust as a stable, open data format; we recommend it as a default choice for organizations building modern data platforms.
Apache Iceberg 是一个面向超大的分析数据集的开放表格格式。Iceberg 支持现代数据分析操作,如条目级的插入、更新、删除、时间旅行查询、ACID 事务、隐藏式分区和完整模式演化。它支持多种底层文件存储格式,如 Apache Parquet、Apache ORC 和 Apache Avro。已有许多数据处理引擎支持 Apache Iceberg,包括一些 SQL 引擎,如 Dremio 和 Trino,以及(结构化)流处理引擎,如 Apache Spark 和 Apache Flink。