Metaflow is a user-friendly Python library and back-end service that helps data scientists and engineers build and manage production-ready data processing, ML training and inference workflows. Metaflow provides Python APIs that structure the code as a directed graph of steps. Each step can be decorated with flexible configurations such as the required compute and storage resources. Code and data artifacts for each step's run (aka task) are stored and can be retrieved either for future runs or the next steps in the flow, enabling you to recover from errors, repeat runs and track versions of models and their dependencies across multiple runs.
The value proposition of Metaflow is the simplicity of its idiomatic Python library: it fully integrates with the build and run-time infrastructure to enable running data engineering and science tasks in local and scaled production environments. At the time of writing, Metaflow is heavily integrated with AWS services such as S3 for its data store service and step functions for orchestration. Metaflow supports R in addition to Python. Its core features are open sourced.
If you're building and deploying your production ML and data-processing pipelines on AWS, Metaflow is a lightweight full-stack alternative framework to more complex platforms such as MLflow.
 
  
                        
                    
                    
                 
    
    
  