Read Files on AWS S3 Without Copying As A Local File

Lots of people use Amazon’s S3 buckets. I tend to use them to save some random csv data to look play around with later. Every once and a while I’d come across a different key path in one of my S3 buckets and just be completely puzzled as to what kind of stuff I had placed in there. Selecting a file and taking a look isn’t terribly hard. Essentially, I would aws s3 cp the file from the s3 bucket to my local machine. From there, one can use any variety of Unix tools to get a better idea of the data contained in said file. See…pretty simple, but it got annoying having to eventually remove the copied local file after I would look through it and see that its contents were not what I was looking for.

For a while I simply cursed the heavens for the lack of AWS S3 commands that would perform similarly to the way cat does. These were truly hard times. Fortunately, I was pleasantly surprised that this behavior could still be recreated without having to actually create a local version of the s3 file to your file system.

aws s3 cp <S3Uri> - will allow you to skip copying the file to your local machine prior to checking inside with head, tail, cat, etc.

Let’s see it in action by peeking into some Amazon customer review data that is stored in a public S3 bucket. The code snippet below allows me to unzip the compressed tab separated value file and print out only the first row of the file.

aws s3 cp s3://amazon-reviews-pds/tsv/amazon_reviews_us_Software_v1_00.tsv.gz - | gunzip -c | head -n1

code3

#TODO data engineering

Read Files on AWS S3 Without Copying As A Local File