Menu
Platforms

Delta Lake

Nov 2019
assess?

Delta Lake is an open-source storage layer by Databricks that attempts to bring transactions to big data processing. One of the problems we often encounter when using Apache Spark is the lack of ACID transactions. Delta Lake integrates with the Spark API and addresses this problem by its use of a transaction log and versioned Parquet files. With its serializable isolation, it allows concurrent readers and writers to operate on Parquet files. Other welcome features include schema enforcement on write and versioning, which allows us to query and revert to older versions of data if necessary. We've started to use it in some of our projects and quite like it.