Technology Radar

AWS Data Wrangler

Published : Apr 13, 2021

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Apr 2021

Trial

AWS Data Wrangler is an open-source library that extends the capabilities of Pandas to AWS by connecting data frames to AWS data-related services. In addition to Pandas, this library leverages Apache Arrow and Boto3 to expose several APIs to load, transform and save data from data lakes and data warehouses. An important limitation is that you can't do large distributed data pipelines with this library. However, you can leverage the native data services — like Athena, Redshift and Timestream — to do the heavy lifting and pull data in order to express complex transformations that are well suited for data frames. We've used AWS Data Wrangler in production and like that it lets you focus on writing transformations without spending too much time on the connectivity to AWS data services.