Enable javascript in your browser for better experience. Need to know to enable it? Go here.
radar blip
radar blip

Versioning data for reproducible analytics

Veröffentlicht : Nov 14, 2018
Not on the current edition
This blip is not on the current edition of the Radar. If it was on one of the last few editions it is likely that it is still relevant. If the blip is older it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar Understand more
Nov 2018
Trial ? Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.

When it comes to large-scale data analysis or machine intelligence problems, being able to reproduce different versions of analysis done on different data sets and parameters is immensely valuable. To achieve reproducible analysis, both the data and the model (including algorithm choice, parameters and hyperparameters) need to be version controlled. Versioning data for reproducible analytics is a relatively trickier problem than versioning models because of the data size. Tools such as DVC help in versioning data by allowing users to commit and push data files to a remote cloud storage bucket using a git-like workflow. This makes it easy for collaborators to pull a specific version of data to reproduce an analysis.

Radar

Download Technology Radar Volume 25

English | Español | Português | 中文

Radar

Stay informed about technology

 

Subscribe now

Visit our archive to read previous volumes