My Public Notepad: AWS S3

Showing posts with label AWS S3. Show all posts

Thursday, 19 May 2022

Using AWS S3 as Terraform Backend

Terraform backend is the place where Terraform stores the state file. By default, it's a local storage (local machine) but it can also be a remote one (AWS, GCS etc...).

In Terraform State | My Public Notepad it was discussed why it's better to use a remote Terraform backend rather than a local one or using a version control system (e.g. Git repository).

AWS-based remote backend comprises:

S3 bucket which stores TF state file

bucket name e.g. tf-state-bucket
key (of the stored resource) is the object path where the state file is stored e.g. path/to/terraform.tfstate
region e.g. eu-south-1

DynamoDB table which implements state locking and consistency checks

name e.g. tf-state-locking
this table must have a primary (hash) key named lockId

To configure remote backend in Terraform, we need to use terraform block in configuration file. We already mentioned this block in Terraform Providers | My Public Notepad when we wanted to specify exact plugin version. There we used terraform_providers block but here, to specify TF backend, we need to use backend block:

main.tf:

resource "local_file" "foo" {

filename = "/root/foo.txt"

content = "This is a content of foo.txt."

}

It is a good practice to keep terraform block in a separate file e.g. terraform.tf:

terraform {

backend "s3" {

bucket = "tf-state-bucket"

key = "path/to/terraform.tfstate"

region = "eu-south-1"

dynamodb_table = "tf-state-locking"

}

backend block "s3" has 3 mandatory attributes: bucket, key and region. dynamodb_table is an optional argument.

If we've used terraform init before switching to the remote backend, terraform apply would issue an error stating that backend reinitialization is required. We simply need to re-run terraform init which will migrate pre-existing state from local to a new s3 backend (state file will be copied from a local disk into s3 bucket). After this we can delete local state file:

$ rm -rf terraform.tfstate

Any future executions of terraform plan or apply would be using the state file stored remotely, in s3 bucket. Pulling and pushing the terraform.tfstate file will be automatic. Prior to each of these operations the state lock would be acquired and after them, it would be released. This will keep integrity of the remotely stored state file.

Resources:

How to manage Terraform state. A guide to file layout, isolation, and… | by Yevgeniy Brikman | Gruntwork

Monday, 16 May 2022

Managing AWS S3 using Terraform

Configuration for provisioning an S3 bucket:

main.tf:

resource "aws_s3_bucket" "images" {

bucket = "images-123456"

tags = {

Description = "Images for training ML model"

}

Both bucket and tags are optional. If bucket is not provided, TF will generate random and unique bucket name.

Bucket name must be DNS-compliant: it cannot contain underscores (but it can contain dashes - minus signs). If name contains underscores (e.g. bucket = "images_123456"), terraform apply will issue an error:

aws_s3_bucket.images: Creating...

Error: error creating S3 Bucket (images_123456): InvalidBucketName: The specified bucket is not valid.
status code: 400, request id: 2D91CDB6D21FDEF1, host id: MzRISOwyjmnup2D91CDB6D21FDEF17/JypPGXLh0OVFGcJaaO3KW/hRAqKOpIEEp

on main.tf line 1, in resource "aws_s3_bucket" "images":
1: resource "aws_s3_bucket" "images" {

To specify objects we want to be uploaded to the bucket, we need to use aws_s3_bucket_object resource type:

resource "aws_s3_bucket_object" {

content = "/root/images/cat001.jpg"

key = "cat001.jpg"

bucket = aws_s3_bucket.images.id

}

content is the local path to the file. key is the name of the object being uploaded. bucket is the bucket id and we use the reference expression to point to the previously created bucket. terraform apply uploads the file to S3 bucket.

Bucket access policy defines who can access this bucket.

If there is IAM group named "ML Engineers" created manually, terraform.tfstate is not aware of it. To get this resource under TF control we need to create its data source so TF can read its attributes:

data "aws_iam_group" "data-ml-engineers" {

group_name = "ml-engineers"

}

Finally, let's define S3 bucket policy which allows ML Engineers group an access to all files in the images bucket:

resource "aws_s3_bucket_policy" "policy-images" {

bucket = aws_s3_bucket. images.id

policy = <<EOF

{

"Version": "2022-05-16",

"Statement": [

{

"Action": "*",

"Effect": "Allow",

"Resource": "arn:aws:s3:::${aws_s3_bucket.images.id}/*",

"Principal": {

"AWS": [

"${data.aws_iam_group.data-ml-engineers.arn}"

]

}

]

}

EOF

}

---

Friday, 13 May 2022

AWS Simple Storage Service (S3)

Cloud Object Storage – Amazon S3 – Amazon Web Services

AWS S3 is an infinitely scalable storage solution which provides high data availability which is achieved by storing data across multiple servers, at multiple locations.

S3 is used for storing any kind of data, in its native format.

Data objects are stored within S3 buckets which are like containers for grouping stored objects. We can create as many buckets as we need. Everything inside S3 bucket is an object: flat files (text, binary, images, videos etc...) and folders.

Max allowed file size is 5 TB.

How to create a S3 bucket

One way is to use AWS Management Console. We need to choose a name and region.

S3 bucket name must be unique because AWS creates DNS name for each new bucket. It comes in form:

https://<bucket_name>.<region>.amazonaws.com

DNS name is publicly accessible. Name also needs to be DNS-compliant: no upper cases and underscores, between 3 and 63 characters long, and should not end with a dash character.

How to add object to bucket?

They can be:

manually uploaded via AWS Console (web page)

for files up to 160GB in size

AWS CLI
AWS SDK
Amazon S3 REST API

Manual upload:

How to access objects in the bucket?

Objects are accessed via urls like:

https://<bucket_name>.<region>.amazonaws.com/my-folder/my-file.txt

Every object in S3 bucket has:

data

Key - actual name of the object e.g. my-file.txt
Value - actual data

metadata

Owner
Size
Last Modified

By default, upon creating a bucket and uploading the objects, no one can access them apart from bucket owner. Access is controlled via bucket policies and access control lists. Bucket policies are at bucket level and access control lists are at the level of individual objects.

Just like IAM policies, bucket policies are JSON documents. With them we can grant access to users, groups, users from other AWS accounts or public access.

Example: bucket policy which allows user adam to retrieve all objects in a bucket

{

"Version": "2022-05-13",

"Statement": [

{

"Action": [

"s3: GetObject"

"Effect": "Allow",

"Resource": "arn:aws:s3:::my-bucket/*",

"Principal": {

"AWS": [

"arn:aws:iam::1234567890:user/adam"

]

}

]

}

S3 Bucket Storage Classes and Lifecycle

By default, all objects added to the bucket have Standard storage class. But if we don't need to retrieve these files frequently, we can transition these objects to different storage classes which might be cheaper but will offer longer object retrieval times.

Bucket's lifecycle policy can be set for transitioning all object to certain storage class, at certain time.

It is also possible to specify storage class for an object being uploaded to the bucket via argument of aws s3 sync command.

If we set 0 days for rule which transitions storage class to Glacier this is what console shows:

If we set 1 days for rule which transitions storage class to Glacier this is what console shows:

S3 Bucket Deletion

Delete bucket

You can delete an empty bucket. Deleting a bucket permanently removes the bucket. If you delete a bucket, your bucket name is immediately available for reuse by any AWS account. Before you reuse a previously deleted bucket name, you can use bucket owner condition to verify that a different AWS account is not using the bucket name. If you want to keep your bucket name, consider emptying the bucket instead.

If your bucket is empty and you still can't delete it, confirm that you have s3:DeleteBucket permissions and that your bucket policy doesn't have a deny statement for s3:DeleteBucket. A service control policy can also deny the delete permission on a bucket.

Route 53 hosted zone settings

If the bucket hosts a static website that uses an Amazon Route 53 hosted zone, clean up the Route 53 hosted zone settings related to the bucket.

Elastic Load Balancing

If the bucket receives log data from Elastic Load Balancing (ELB), stop log delivery before deleting the bucket. If another user creates a bucket with the same name, your log data could be delivered to that bucket.

Questions

How to list all buckets that have public access?

How to list all objects publicly available?

---

Pages