menu

The information in our interactive Radar is currently only available in English. To get information in your native language, please download the PDF here.

Techniques

Big Data envy

NOT ON THE CURRENT EDITION
This blip is not on the current edition of the radar. If it was on one of the last few editions it is likely that it is still relevant. If the blip is older it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the radar.Understand more
Mar 2017
hold?

We continue to see organizations chasing "cool" technologies, taking on unnecessary complexity and risk when a simpler choice would be better. One particular theme is using distributed, Big Data systems for relatively small data sets. This behavior prompts us to put Big Data envy on hold once more, with some additional data points from our recent experience. The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. The operational team you'll need to keep the thing running just isn't worth it. While creating this edition of the Radar, we discussed several new database technologies, many offering "10x" performance improvements over existing systems. We're always skeptical until new technology—especially something as critical as a database—has been properly proven. Jepsen provides analysis of database performance under difficult conditions and has found numerous bugs in various NoSQL databases. We recommend maintaining a healthy dose of skepticism and keeping an eye on sites such as Jepsen when you evaluate database tech.

Nov 2016
hold?

We continue to see organizations chasing "cool" technologies, taking on unnecessary complexity and risk when a simpler choice would be better. One particular theme is using distributed, Big Data systems for relatively small data sets. This behavior prompts us to put Big Data envy on hold once more, with some additional data points from our recent experience. The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. The operational team you’ll need to keep the thing running just isn’t worth it. While creating this edition of the Radar, we discussed several new database technologies, many offering "10x" performance improvements over existing systems. We’re always skeptical until new technology—especially something as critical as a database—has been properly proven. Jepsen provides analysis of database performance under difficult conditions and has found numerous bugs in various NoSQL databases. We recommend maintaining a healthy dose of skepticism and keeping an eye on sites such as Jepsen when you evaluate database tech.

Apr 2016
hold?

While we've long understood the value of Big Data to better understand how people interact with us, we've noticed an alarming trend of Big Data envy: organizations using complex tools to handle "not-really-that-big” Data. Distributed map-reduce algorithms are a handy technique for large data sets, but many data sets we see could easily fit in a single-node relational or graph database. Even if you do have more data than that, usually the best thing to do is to first pick out the data you need, which can often then be processed on such a single node. So we urge that before you spin up your clusters, take a realistic assessment of what you need to process, and if it fits—maybe in RAM—use the simple option.