Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Last updated : Oct 26, 2022
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Oct 2022
Adopt ? We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.

Delta Lake is an open-source storage layer, implemented by Databricks, that attempts to bring ACID transactions to big data processing. In our Databricks-enabled data lake or data mesh projects, our teams prefer using Delta Lake storage over the direct use of file storage types such as AWS S3 or ADLS. Until recently, Delta Lake has been a closed proprietary product from Databricks, but it's now open source and accessible to non-Databricks platforms. However, our recommendation of Delta Lake as a default choice currently extends only to Databricks projects that use Parquet file formats. Delta Lake facilitates concurrent data read/write use cases where file-level transactionality is required. We find Delta Lake's seamless integration with Apache Spark batch and micro-batch APIs very helpful, particularly features such as time travel (accessing data at a particular point in time or commit reversion) as well as schema evolution support on write.

Apr 2021
Trial ? Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.

Delta Lake is an open-source storage layer, implemented by Databricks, that attempts to bring ACID transactions to big data processing. In our Databricks-enabled data lake or data mesh projects, our teams continue to prefer using Delta Lake storage over the direct use of file storage types such S3 or ADLS. Of course this is limited to projects that use storage platforms that support Delta Lake when using Parquet file formats. Delta Lake facilitates concurrent data read/write use cases where file-level transactionality is required. We find Delta Lake's seamless integration with Apache Spark batch and micro-batch APIs greatly helpful, particularly features such as time travel — accessing data at a particular point in time or commit reversion — as well as schema evolution support on write; though there are some limitations on these features.

Nov 2019
Assess ? Worth exploring with the goal of understanding how it will affect your enterprise.

Delta Lake is an open-source storage layer by Databricks that attempts to bring transactions to big data processing. One of the problems we often encounter when using Apache Spark is the lack of ACID transactions. Delta Lake integrates with the Spark API and addresses this problem by its use of a transaction log and versioned Parquet files. With its serializable isolation, it allows concurrent readers and writers to operate on Parquet files. Other welcome features include schema enforcement on write and versioning, which allows us to query and revert to older versions of data if necessary. We've started to use it in some of our projects and quite like it.

Published : Nov 20, 2019

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes