menu
Techniques

Big Data envy

HOLD?

We continue to see organizations chasing "cool" technologies, taking on unnecessary complexity and risk when a simpler choice would be better. One particular theme is using distributed, Big Data systems for relatively small data sets. This behavior prompts us to put Big Data envy on hold once more, with some additional data points from our recent experience. The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. The operational team you'll need to keep the thing running just isn't worth it. While creating this edition of the Radar, we discussed several new database technologies, many offering "10x" performance improvements over existing systems. We're always skeptical until new technology—especially something as critical as a database—has been properly proven. Jepsen provides analysis of database performance under difficult conditions and has found numerous bugs in various NoSQL databases. We recommend maintaining a healthy dose of skepticism and keeping an eye on sites such as Jepsen when you evaluate database tech.

History for Big Data envy

Mar 2017
Hold?

We continue to see organizations chasing "cool" technologies, taking on unnecessary complexity and risk when a simpler choice would be better. One particular theme is using distributed, Big Data systems for relatively small data sets. This behavior prompts us to put Big Data envy on hold once more, with some additional data points from our recent experience. The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. The operational team you'll need to keep the thing running just isn't worth it. While creating this edition of the Radar, we discussed several new database technologies, many offering "10x" performance improvements over existing systems. We're always skeptical until new technology—especially something as critical as a database—has been properly proven. Jepsen provides analysis of database performance under difficult conditions and has found numerous bugs in various NoSQL databases. We recommend maintaining a healthy dose of skepticism and keeping an eye on sites such as Jepsen when you evaluate database tech.

Nov 2016
Hold?

We continue to see organizations chasing "cool" technologies, taking on unnecessary complexity and risk when a simpler choice would be better. One particular theme is using distributed, Big Data systems for relatively small data sets. This behavior prompts us to put Big Data envy on hold once more, with some additional data points from our recent experience. The Apache Cassandra database promises massive scalability on commodity hardware, but we have seen teams overwhelmed by its architectural and operational complexity. Unless you have data volumes that require a 100+ node cluster, we recommend against using Cassandra. The operational team you’ll need to keep the thing running just isn’t worth it. While creating this edition of the Radar, we discussed several new database technologies, many offering "10x" performance improvements over existing systems. We’re always skeptical until new technology—especially something as critical as a database—has been properly proven. Jepsen provides analysis of database performance under difficult conditions and has found numerous bugs in various NoSQL databases. We recommend maintaining a healthy dose of skepticism and keeping an eye on sites such as Jepsen when you evaluate database tech.

Apr 2016
Hold?

While we've long understood the value of Big Data to better understand how people interact with us, we've noticed an alarming trend of Big Data envy: organizations using complex tools to handle "not-really-that-big” Data. Distributed map-reduce algorithms are a handy technique for large data sets, but many data sets we see could easily fit in a single-node relational or graph database. Even if you do have more data than that, usually the best thing to do is to first pick out the data you need, which can often then be processed on such a single node. So we urge that before you spin up your clusters, take a realistic assessment of what you need to process, and if it fits—maybe in RAM—use the simple option.

Learn more about our approach to assessing technology & build your own radar Get StartedNo thanks