Menu
Tools

Azure Data Factory for orchestration

Nov 2019
hold?

Azure Data Factory (ADF) is currently Azure's default product for orchestrating data-processing pipelines. It supports data ingestion, copying data from and to different storage types on prem or on Azure and executing transformation logic. While we've had a reasonable experience with ADF for simple migrations of data stores from on prem to cloud, we discourage the use of Azure Data Factory for orchestration of complex data-processing pipelines. Our experience has been challenging due to several factors, including limited coverage of capabilities that can be implemented through coding first, as it appears that ADF is prioritizing enabling low-code platform capabilities first; poor debuggability and error reporting; limited observability as ADF logging capabilities don't integrate with other products such as Azure Data Lake Storage or Databricks, making it difficult to get an end-to-end observability in place; and availability of data source-triggering mechanisms only to certain regions. At this time, we encourage using other open-source orchestration tools (e.g., Airflow) for complex data pipelines and limit ADF for data copying or snapshotting. We're hoping that ADF will address these concerns to support for more complex data-processing workflows and prioritize access to capabilities through code first.