Languages & Frameworks

AWS Data Wrangler

Published: Apr 13, 2021
Apr 2021

AWS Data Wrangler is an open-source library that extends the capabilities of Pandas to AWS by connecting data frames to AWS data-related services. In addition to Pandas, this library leverages Apache Arrow and Boto3 to expose several APIs to load, transform and save data from data lakes and data warehouses. An important limitation is that you can't do large distributed data pipelines with this library. However, you can leverage the native data services — like Athena, Redshift and Timestream — to do the heavy lifting and pull data in order to express complex transformations that are well suited for data frames. We've used AWS Data Wrangler in production and like that it lets you focus on writing transformations without spending too much time on the connectivity to AWS data services.