Technology Radar

dbldatagen

Published : Oct 23, 2024

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Oct 2024

Assess

Preparar los datos de prueba para ingeniería de datos es un gran desafío. Transferir datos desde producción a ambientes de prueba puede ser riesgoso, por lo que los equipos a menudo optan por utilizar datos falsos o sintéticos en su lugar. En este Radar, exploramos enfoques novedosos como datos sintéticos para pruebas y entrenamiento de modelos. Sin embargo, en muchas ocasiones, la generación procedural de bajo costo es suficiente.

dbldatagen (Generador de Datos de Databricks Labs) es una de esas herramientas; se trata de una biblioteca de Python para generar datos sintéticos dentro del entorno de Databricks, utilizada para pruebas, benchmarking, demos y muchos otros usos. dbldatagen puede generar datos sintéticos a gran escala, alcanzando hasta miles de millones de filas en cuestión de minutos, y soporta varios escenarios como múltiples tablas, change data capture y operaciones de merge/join. Maneja bien los tipos primitivos de Spark SQL, genera rangos y valores discretos, y aplica distribuciones específicas. Al crear datos sintéticos utilizando el ecosistema de Databricks, dbldatagen es una opción que vale la pena evaluar.

Download the PDF

English | Português

Sign up for the Technology Radar newsletter

Subscribe now

Industrias

Publicaciones Digitales y Herramientas

Todos los Insights

dbldatagen

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read the previous volumes