When using techniques such as 'instrument all the things' and semantic logging, you may end up with huge amount of log data. Collecting, aggregating and moving this data can be problematic. Flume is a distributed system for exactly this purpose. It has a flexible architecture based on streaming data flows. With built-in support for HDFS, Flume can easily move multi-terabyte log data from many different sources to a centralized data store for further processing.