Python

Boto3 – Amazon S3 As Python Object Store

Use Amazon Simple Storage Service(S3) as an object store to manage Python data structures.

1.Introduction

Amazon S3 is extensively used as a file storage system to store and share files across the internet.  Amazon S3 can be used to store any type of objects, it is a simple key value store.  It can be used to store objects created in any programming languages, such as Java, JavaScript, Python etc.  AWS DynamoDB recommends to use S3 to store large items of size more than 400KB.  This article focuses on using S3 as an object store using Python.

2. Pre-requisites

The Boto3 is the official AWS SDK to access AWS services using Python code.  Please ensure Boto3 and awscli are installed in the system.

$pip install boto3
$pip install awscli

Also configure the AWS credentials using “aws configure” command or set up environmental variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY store your keys in the environment.  Please DO NOT hard code your AWS Keys inside your Python program.

To configure aws credentials, first install awscli and then use “aws configure” command to setup.  For more details refer AWS CLI Setup and Boto3 Credentials.

Configure the AWS credentials using command:

$aws configure

Do a quick check to ensure you can reach AWS.

$aws s3 ls

The above CLI must show the S3 buckets created in your AWS account.  The AWS account will be selected based on the credentials configured.  In case, multiple AWS accounts are configured, use the “–profile ” option in the AWS CLI.  If you don’t mention “–profile ” option the CLI takes the profile “default”.

Use the below commands to configure development profile named “dev” and validate the settings.

$aws configure -profile dev
$aws s3 ls --profile dev

The above command show s3 buckets present in the account which belongs to “dev” profile.

3. Connecting to S3

3.1 Connecting to Default Account (Profile)

The client() API connects to the specified service in AWS.  The below code snippet connects to S3 using the default profile credentials and lists all the S3 buckets.

import boto3

s3 = boto3.client('s3')
buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
    print bucket['CreationDate'].ctime(), bucket['Name']

3.2 Connecting to Specific Account (Profile)

To connect to a specific account, first create session using Session() API.  The Session() API allows to mention the profile name and region.  It also allows to specify the AWS credentials.

The below code snippet connects to an AWS account configured using “dev” profile and lists all the S3 buckets.

import boto3

session = boto3.Session(profile_name="dev", region_name="us-west-2")
s3 = session.client('s3')buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
    print bucket['CreationDate'].ctime(), bucket['Name']

4. Storing and Retrieving a Python LIST

Boto3 supports put_object() and get_object() APIs to store and retrieve objects in S3.  But the objects must be serialized before storing.   The python pickle library supports serialization and deserialization of objects.  Pickle is available by default in Python installation.

The APIs pickle.dumps() and pickle.loads() is used to serialize and deserialize Python objects.

4.1 Storing a List in S3 Bucket

Ensure serializing the Python object before writing into the S3 bucket.  The list object must be stored using an unique “key”.  If the key is already present, the list object will be overwritten.

import boto3
import pickle

s3 = boto3.client('s3')
myList=[1,2,3,4,5]

#Serialize the object 
serializedListObject = pickle.dumps(myList)

#Write to Bucket named 'mytestbucket' and 
#Store the list using key myList001

s3.put_object(Bucket='mytestbucket',Key='myList001',Body=serializedListObject)

The put_object() API may return a “NoSuchBucket” exception, if bucket does not exists in your account.

NOTE:  Please modify bucket name to your S3 bucket name.  I don’t won this bucket.

4.2 Retrieving a List from S3 Bucket

The list is stored as a stream object inside Body.  It can be read using read() API of the get_object() returned value.  It can throw an “NoSuchKey” exception, if the key is not present.

import boto3
import pickle

#Connect to S3
s3 = boto3.client('s3')

#Read the object stored in key 'myList001'
object = s3.get_object(Bucket='mytestbucket',Key='myList001')
serializedObject = object['Body'].read()

#Deserialize the retrieved object
myList = pickle.loads(serializedObject)

print myList

5 Storing and Retrieving a Python Dictionary

Python dictionary objects can be stored and retrieved in the same way using put_object() and get_object() APIs.

5.1 Storing a Python Dictionary Object in S3

import boto3
import pickle


#Connect to S3 default profile
s3 = boto3.client('s3')

myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
#Serialize the object
serializedMyData = pickle.dumps(myData)

#Write to S3 using unique key - EmpId007
s3.put_object(Bucket='mytestbucket',Key='EmpId007')

5.2 Retrieving Python Dictionary Object from S3 Bucket

Use the get_object() API to read the object.  The data is stored as a stream inside the Body object.  This can be read using read() API.

import boto3

s3 = boto3.client('s3')

object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()

myData = pickle.loads(serializedObject)

print myData

6 Working with JSON

When working with Python dictionary, it is recommended to store it as JSON, if the consumer applications are not written in Python or do not have support for Pickle library.

The api json.dumps() converts the Python Dictionary into JSON and json.loads() converts a JSON to a Python dictionary.

6.1 Storing a Python Dictionary Object As JSON in S3 bucket

import boto3
import json

s3 = boto3.client('s3')

myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
serializedMyData = json.dumps(myData)

s3.put_object(Bucket='mytestbucket',Key='EmpId007')

6.2 Retrieving a JSON from S3 bucket

import boto3
import json

s3 = boto3.client('s3')
object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()

myData = json.loads(serializedObject)

print myData

7 Upload and Download a Text File

Boto3 supports upload_file() and download_file() APIs to store and retrieve files to and from your local file system to S3.  As per S3 standards, if the Key contains strings with “/” (forward slash) will be considered as sub folders.

7.1 Uploading a File

import boto3

s3 = boto3.client('s3')
s3.upload_file(Bucket='mytestbucket', Key='subdir/abc.txt', Filename='./abc.txt')

7.2 download a File from S3 bucket

import boto3

s3 = boto3.clinet('s3')
s3.download_file(Bucket='mytestbucket',Key='subdir/abc.txt',Filename='./abc.txt')

8 Error Handling

The Boto3 APIs can raise various exceptions depends on the condition.  For example, “DataNotFoundError”,”NoSuchKey”,”HttpClientError“, “ConnectionError“,”SSLError” are few of them.  The Boto3 exceptions inherit Python “Exception” class.  So handle the exceptions by looking for Exceptions class in error and exception handling in the code.

import boto3

try:
s3 = s3.client('s3')
except Exceptions as e:
        print "Exception ",e

9.Summary

Storing python objects to an external store has many use cases.  For example,  a game developer can store intermediate state of objects and fetch them when the gamer resumes from where left, API developer can use S3 object store as a simple key value store are few to mention.  Please refer the URLs in the Reference sections to learn more.  Thanks.

References

[i] Boto3 – https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

[ii] Boto3 S3 APIhttps://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html

[iii] AWS CLI – https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html

[iv] AWS Boto3 Credentials https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

[v]Python 2.7 Pickle Library – https://docs.python.org/3/library/pickle.html

[vi] Boto3 Exceptions  https://github.com/boto/botocore/blob/develop/botocore/exceptions.py

Published on Web Code Geeks with permission by Saravanan Subramanian, partner at our WCG program. See the original article here: Boto3 – Amazon S3 As Python Object Store

Opinions expressed by Web Code Geeks contributors are their own.

Saravanan Subramanian

Saravanan Subramanian is a System Architect cum Agile Product Owner for developing cloud based applications for service provider back office enterprise applications using open source technologies. He is passionate about Cloud Technology and Big Data Analytics.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button