Security and S3 Multipart Upload

infrastructure
Posted by Xiao Li

15 June 2015

We built Mingle SaaS with security on top of AWS Web Services since day one. One of the best practices of AWS security is using temporary credentials in Amazon EC2 instances. We implemented it across all of our internal services. It may complicate solving specific problems, but we think it’s worth it because securing customer data is important.

Recently we ran into an issue related to the Amazon Simple Storage Service—also called S3—multipart upload API when we tried to support customers moving their large projects onto SaaS. Basically, we needed to support customers uploading files larger than 40 GB.

The problem

Our original file upload for the project import feature allowed file sizes up to 5GB, but recently we found some of our customers’ project import files exceeded 40 GB. As we stored the files on S3, it became obvious that we should implement a client solution to directly upload files to S3 from users’ browsers. The original 5GB limit was S3’s limit for a single file upload.

Amazon S3 offers a multipart upload API for files up to 5TB in size. There are three steps to complete it:

  1. Initiate multipart upload and get an upload id from S3.
  2. Upload file parts with the upload id from step 1. The request needs to be signed by a secure key.
  3. After file parts are uploaded, complete/abort the upload with ETags generated from step 2.

These three steps need to be done in sequence, one by one, and all of the requests sent to S3 service need to be signed by AWS credentials’ secure key. Hence a client-side solution needs to send multiple upload requests to the server side to sign multiple upload requests.

What we learned on the way

We used to use a pre-signed url to upload files, but it doesn’t work for multipart uploads anymore. So we needed a totally new solution for it. We found EvaporateJS does the job. The first implementation is just following the documentation of EvaporateJS. But it didn’t work when we tested it in an environment that had multiple EC2 instances behind an ELB.

It turns out, we used one temp access key id and session token from an EC2 instance to setup EvaporateJS, but EvaporateJS sent requests to different EC2 instances to sign the multipart upload requests. Different EC2 instances have different temp credentials. So the multipart upload requests got signed, but by the wrong secure keys.

The second attempt had better results. We tried to let EvaporateJS handle the issue that when it asks the server to sign multipart upload request headers, the server side responds to a full AWS Authorization header with the correct session token. Thus, each request of multipart upload requests will be signed by different AWS temporary credentials depending on the EC2 instance processing the request of signing the multipart upload request.

Now, all requests sent to S3 are signed by the correct secure key with the correct access key id, and session token. But it still doesn’t work. AWS S3 responds with a ‘403 forbidden’ error when the S3 multipart upload requests are signed by different credentials. We couldn’t find documentation about this but we later talked with AWS technical support, and they confirmed it is by design.

So you have to use the same credentials to sign all multipart upload requests. We really don’t want to hard code an AWS account user’s access key id and secret key into our EC2 instances. So we looked into creating temp credentials from EC2 instance’s temp credentials and sharing it across all EC2 instances for signing the multipart upload requests. However, the temp credentials created from another temp credential expire after an hour. One hour is just not enough time to upload a large file in a variant network situation. Later, we also confirmed with AWS support that the only way to make it work would be to use long-term access keys to sign all multipart upload requests.

Solution: a full example

This example is from our implementation, so it is Ruby/Rails application and uses eb_deployer as deployment tool to provision AWS resources.

Create S3 upload user with permissions needed for multipart upload API and its credentials

As we use eb_deployer to deploy our services, the above CloudFormation snippet is part of the eb_deployer deployment configuration for provisioning resources that are shared by Elastic Beanstalk environments. The UploadBucketName used inside can be an input parameter or created inside the template.

Pass the credentials to the application environment

The basic idea is we pass the S3 upload credentials into the application environment so we can reference them through environment variables. Here we named our credentials S3_MULTIPART_UPLOAD_ACCESS_KEY_ID and S3_MULTIPART_UPLOAD_SECRET_ACCESS_KEY.

Initialize data attributes for EvaporateJS:

The above data attributes will be set to an HTML tag in the file upload UI.

Set up EvaporateJS with temp credentials:

Nothing special here, please see EvaporateJS for full documentation.

Sign multipart upload request headers with S3_MULTIPART_UPLOAD_SECRET_ACCESS_KEY


comments powered by Disqus