AWS S3 / boto3 Exercise ------------------------- The object of this exercise is to reinforce the use of AWS S3 with the boto3 python library and pandas. Steps ------- ------------------------------------------------------------- 1. Put a csv dataset into S3 ------------------------------------------------------------- Create a new folder in an S3 bucket. You may create a new S3 bucket if you don't already have one from the training session. Download the latest covid data as a csv file from here: https://covid.ourworldindata.org/data/owid-covid-data.csv Add this csv file to the S3 folder. You could try using the aws cli to do this. For example, if you open powershell in the same directory that you downloaded the owid covid csv. This command would copy it to s3 in a folder called "frank-exercise" in the "data-eng-21" bucket: aws s3 cp owid-covid-data.csv s3://data-eng-21/frank-exercise/owid-covid-data.csv ------------------------------------------------------------ 2. Write a python notebook to interface with S3 ------------------------------------------------------------ Create a new Python3 Jupyter notebook. In this notebook write code that uses boto3 to list the contents of your new S3 folder. Below is some demonstration code to get you started. This will print a list of the buckets in S3. ############################# import boto3 s3_client = boto3.client("s3") # buckets_dict will be a python dictionary containing info about my buckets buckets_dict = s3_client.list_buckets() # loop throught the list of buckets and print the name of each one for bucket in buckets_dict['Buckets']: print('Bucket Name:', bucket['Name']) ############################# ---------------------------------------------------------- 3. Read the csv from s3 directly into a dataframe ---------------------------------------------------------- Use pandas to load the contents of the csv file DIRECTLY from s3 into a pandas dataframe. (e.g. use an s3:// url in pandas read_csv) Create a simple plot with python showing the progress of covid cases over time.