Amazon S3 has a simple web services interface that you can
use to store and retrieve any amount of data, at any time, from anywhere on the
web.
Access control defines who can access objects and buckets
within Amazon S3, and the type of access (e.g., READ and WRITE). The
authentication process verifies the identity of a user who is trying to access
Amazon Web Services (AWS)
Advantages to Amazon S3:
Amazon S3 is intentionally built with a minimal feature set
that focuses on simplicity and robustness. Following are some of advantages of
the Amazon S3 service:
·
Create Buckets – Create and name
a bucket that stores data. Buckets are the fundamental container in Amazon S3
for data storage.
·
Store data in Buckets – Store an
infinite amount of data in a bucket. Upload as many objects as you like into an
Amazon S3 bucket. Each object can contain up to 5 TB of data. Each object is
stored and retrieved using a unique developer-assigned key.
·
Download data – Download your
data or enable others to do so. Download your data any time you like or allow
others to do the same.
·
Permissions – Grant or deny
access to others who want to upload or download data into your Amazon S3
bucket. Grant upload and download permissions to three types of users.
Authentication mechanisms can help keep data secure from unauthorized access.
·
Standard interfaces – Use
standards-based REST and SOAP interfaces designed to work with any
Internet-development toolkit.
Note: SOAP
support over HTTP is deprecated, but it is still available over HTTPS. New
Amazon S3 features will not be supported for SOAP. We recommend that you use
either the REST API or the AWS SDKs.
key concepts and terminology:
Buckets: A
bucket is a container for objects stored in Amazon S3. Every object is
contained in a bucket. For example, if the object named photos/jigi.jpg is stored in the nitanshi bucket, then it is addressable using the
URL http://nitanshi.s3.amazonaws.com/photos/jigi.jpg
Buckets
serve several purposes: they organize the Amazon S3 namespace at the highest
level, they identify the account responsible for storage and data transfer
charges, they play a role in access control, and they serve as the unit of
aggregation for usage reporting. You can configure buckets so that they
are created in a specific region.
You can
also configure a bucket so that every time an object is added to it, Amazon S3
generates a unique version ID and assigns it to the object.
Objects
Objects
are the fundamental entities stored in Amazon S3. Objects consist of object
data and metadata. The data portion is opaque to Amazon S3. The metadata is a
set of name-value pairs that describe the object. These include some default
metadata, such as the date last modified, and standard HTTP metadata, such as
Content-Type. You can also specify custom metadata at the time the object is
stored. An object is uniquely identified within a bucket by a key (name)
and a version ID.
Keys
A key
is the unique identifier for an object within a bucket. Every object in a
bucket has exactly one key. Because the combination of a bucket, key, and
version ID uniquely identify each object, Amazon S3 can be thought of as a
basic data map between "bucket + key + version" and the object
itself. Every object in Amazon S3 can be uniquely addressed through the
combination of the web service endpoint, bucket name, key, and optionally, a
version.
Regions
You can
choose the geographical region where Amazon S3 will store the buckets you
create. You might choose a region to optimize latency, minimize costs, or
address regulatory requirements. Objects stored in a region never leave the
region unless you explicitly transfer them to another region. For example, objects stored in
the EU (Ireland) region never leave it.
Amazon S3 Data Consistency
Model
Amazon
S3 provides read-after-write consistency for PUTS of new objects in your S3
bucket in all regions with one caveat. The caveat is that if you make a HEAD or
GET request to the key name (to find if the object exists) before creating the
object, Amazon S3 provides eventual consistency for read-after-write.
Amazon
S3 offers eventual consistency for overwrite PUTS and DELETES in all regions.
Updates
to a single key are atomic. For example, if you PUT to an existing key, a
subsequent read might return the old data or the updated data, but it will
never write corrupted or partial data.
Amazon
S3 achieves high availability by replicating data across multiple servers
within Amazon's data centers. If a PUT request is successful, your data is
safely stored. However, information about the changes must replicate across
Amazon S3, which can take some time, and so you might observe the following
behaviors:
- A process writes a new object
to Amazon S3 and immediately lists keys within its bucket. Until the
change is fully propagated, the object might not appear in the list.
- A process replaces an existing
object and immediately attempts to read it. Until the change is fully
propagated, Amazon S3 might return the prior data.
- A process deletes an existing
object and immediately attempts to read it. Until the deletion is fully
propagated, Amazon S3 might return the deleted data.
- A process deletes an existing
object and immediately lists keys within its bucket. Until the deletion is
fully propagated, Amazon S3 might list the deleted object.
Note:1. Amazon S3 does not
currently support object locking. If two PUT requests are simultaneously made
to the same key, the request with the latest time stamp wins. If this is an
issue, you will need to build an object-locking mechanism into your
application.
2. Updates are key-based;
there is no way to make atomic updates across keys. For example, you cannot
make the update of one key dependent on the update of another key unless you
design this functionality into your application.
Amazon S3 Features:
Storage Classes:Amazon S3 offers a range of
storage classes designed for different use cases. These include Amazon S3
STANDARD for general-purpose storage of frequently accessed data, Amazon S3
STANDARD_IA for long-lived, but less frequently accessed data, and GLACIER for
long-term archive.
Bucket Policies: Bucket policies provide
centralized access control to buckets and objects based on a variety of
conditions, including Amazon S3 operations, requesters, resources, and aspects
of the request (e.g., IP address). The policies are expressed in our access
policy language and enable centralized management of permissions. The
permissions attached to a bucket apply to all of the objects in that bucket.
Individuals as well as companies can use bucket policies.
When companies register with Amazon S3 they create an account. Thereafter, the
company becomes synonymous with the account. Accounts are financially
responsible for the Amazon resources they (and their employees) create.
Accounts have the power to grant bucket policy permissions and assign employees
permissions based on a variety of conditions. For example, an account could
create a policy that gives a user write access:
- To a particular S3 bucket
- From an account's corporate
network
- During business hours
An account can grant one user limited read and write access,
but allow another to create and delete buckets as well. An account could allow
several field offices to store their daily reports in a single bucket, allowing
each office to write only to a certain set of names (e.g., "Noida/*"
or "Kolka/*") and only from the office's IP address range.
Unlike access control lists (described below), which can add
(grant) permissions only on individual objects, policies can either add or deny
permissions across all (or a subset) of objects within a bucket. With one
request an account can set the permissions of any number of objects in a
bucket. An account can use wildcards (similar to regular expression operators)
on Amazon resource names (ARNs) and other values, so that an account can
control access to groups of objects that begin with a common prefix or end with
a given extension such as .html.
Only the
bucket owner is allowed to associate a policy with a bucket. Policies, written
in the access policy language, allow or deny requests based on:
- Amazon S3 bucket operations ,
and object operations.
- Requester
- Conditions specified in the
policy
An account can control access based on specific Amazon S3
operations, such as
GetObject, GetObjectVersion, DeleteObject, or DeleteBucket.
The
conditions can be such things as IP addresses, IP address ranges in CIDR
notation, dates, user agents, HTTP referrer and transports (HTTP and HTTPS).
AWS Identity and Access
Management
You can use IAM with Amazon S3 to control the
type of access a user or group of users has to specific parts of an Amazon S3
bucket your AWS account owns.
Common Operations:
- Create a Bucket – Create and name your own bucket in which to
store your objects.
- Write an Object – Store data by creating or overwriting an
object. When you write an object, you specify a unique key in the
namespace of your bucket. This is also a good time to specify any access
control you want on the object.
- Read an Object – Read data back. You can download the data via
HTTP or BitTorrent.
- Deleting an Object – Delete some of your data.
- Listing Keys – List the keys contained in one of your buckets.
You can filter the key list based on a prefix.
Amazon S3 Application
Programming Interfaces (API)
The Amazon S3 architecture is designed to be programming
language-neutral, using amazon supported
interfaces to store and retrieve objects.
Amazon S3
provides a REST and a SOAP interface. They are similar, but there are some
differences. For example, in the REST interface, metadata is returned in HTTP
headers. Because we only support HTTP requests of up to 4 KB (not including the
body), the amount of metadata you can supply is restricted.
The REST Interface
The REST API is an HTTP interface to Amazon S3. Using REST, you
use standard HTTP requests to create, fetch, and delete buckets and objects.
You can use
any toolkit that supports HTTP to use the REST API. You can even use a browser
to fetch objects, as long as they are anonymously readable.
The REST API
uses the standard HTTP headers and status codes, so that standard browsers and
toolkits work as expected. In some areas, we have added functionality to HTTP
(for example, we added headers to support access control). In these cases, we
have done our best to add the new functionality in a way that matched the style
of standard HTTP usage.
The SOAP Interface
SOAP support over HTTP is deprecated, but it is still
available over HTTPS. New Amazon S3 features will not be supported for SOAP. We
recommend that you use either the REST API or the AWS SDKs.
The SOAP API provides a SOAP 1.1 interface using document
literal encoding.
e a SOAP toolkit such as Apache Axis or Microsoft .NET to
create bindings, and then write code that uses the bindings to call Amazon S3.
Paying for Amazon S3
Pricing for Amazon S3 is designed so that you don't have to plan
for the storage requirements of your application. Most storage providers force
you to purchase a predetermined amount of storage and network transfer
capacity: If you exceed that capacity, your service is shut off or you are
charged high overage fees. If you do not exceed that capacity, you pay as
though you used it all.
Amazon S3
charges you only for what you actually use, with no hidden fees and no overage
charges. This gives developers a variable-cost service that can grow with their
business while enjoying the cost advantages of Amazon's infrastructure.
Before
storing anything in Amazon S3, you need to register with the service and
provide a payment instrument that will be charged at the end of each month.
There are no set-up fees to begin using the service. At the end of the month,
your payment instrument is automatically charged for that month's usage.