Friday 13 May 2022

AWS Simple Storage Service (S3)


 
 
 
AWS S3 is an infinitely scalable storage solution which provides high data availability which is achieved by storing data across multiple servers, at multiple locations.
 
S3 is used for storing any kind of data, in its native format.

Data objects are stored within S3 buckets which are like containers for grouping stored objects. We can create as many buckets as we need. Everything inside S3 bucket is an object: flat files (text, binary, images, videos etc...) and folders.

Max allowed file size is 5 TB.

How to create a S3 bucket


One way is to use AWS Management Console. We need to choose a name and region.

S3 bucket name must be unique because AWS creates DNS name for each new bucket. It comes in form:

https://<bucket_name>.<region>.amazonaws.com

DNS name is publicly accessible. Name also needs to be DNS-compliant: no upper cases and underscores, between 3 and 63 characters long, and should not end with a dash character. 


How to add object to bucket?


They can be:
  • manually uploaded via AWS Console (web page)
    • for files up to 160GB in size
  • AWS CLI
  • AWS SDK
  • Amazon S3 REST API

Manual upload:




How to access objects in the bucket?


Objects are accessed via urls like:

https://<bucket_name>.<region>.amazonaws.com/my-folder/my-file.txt

Every object in S3 bucket has:
  • data
    • Key - actual name of the object e.g. my-file.txt
    • Value - actual data
  • metadata
    • Owner
    • Size
    • Last Modified
By default, upon creating a bucket and uploading the objects, no one can access them apart from bucket owner. Access is controlled via bucket policies and access control lists. Bucket policies are at bucket level and access control lists are at the level of individual objects. 

Just like IAM policies, bucket policies are JSON documents. With them we can grant access to users, groups, users from other AWS accounts or public access.

Example: bucket policy which allows user adam to retrieve all objects in a bucket

{
    "Version": "2022-05-13",
    "Statement": [
        {
            "Action": [
                "s3: GetObject"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::my-bucket/*",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::1234567890:user/adam"
                ]
            }
    ]
}


S3 Bucket Storage Classes and Lifecycle


By default, all objects added to the bucket have Standard storage class. But if we don't need to retrieve these files frequently, we can transition these objects to different storage classes which might be cheaper but will offer longer object retrieval times.

Bucket's lifecycle policy can be set for transitioning all object to certain storage class, at certain time.

It is also possible to specify storage class for an object being uploaded to the bucket via argument of aws s3 sync command. 


If we set 0 days for rule which transitions storage class to Glacier this is what console shows:


If we set 1 days for rule which transitions storage class to Glacier this is what console shows:





S3 Bucket Deletion


Delete bucket

You can delete an empty bucket. Deleting a bucket permanently removes the bucket. If you delete a bucket, your bucket name is immediately available for reuse by any AWS account. Before you reuse a previously deleted bucket name, you can use bucket owner condition to verify that a different AWS account is not using the bucket name. If you want to keep your bucket name, consider emptying the bucket instead.

If your bucket is empty and you still can't delete it, confirm that you have s3:DeleteBucket permissions and that your bucket policy doesn't have a deny statement for s3:DeleteBucket. A service control policy can also deny the delete permission on a bucket.

Route 53 hosted zone settings

If the bucket hosts a static website that uses an Amazon Route 53 hosted zone, clean up the Route 53 hosted zone settings related to the bucket.

Elastic Load Balancing

If the bucket receives log data from Elastic Load Balancing (ELB), stop log delivery before deleting the bucket. If another user creates a bucket with the same name, your log data could be delivered to that bucket.


Questions


How to list all buckets that have public access? 
How to list all objects publicly available?



---

No comments: