Thursday 9 March 2023

Amazon Elastic File System (EFS)


Network File System (NFS) is a distributed file system protocol that allows you to share remote directories over a network. With NFS, you can mount remote directories on your system and work with the remote files as if they were local files.

On Linux and UNIX operating systems, you can use the mount command (see 
Ubuntu Manpage: mount - mount a filesystem) to mount a shared NFS directory on a particular mount point in the local directory tree.

Amazon Elastic File System (EFS) is:
  • one such shared, remote, network directory that can be mounted to Unix/Linux OS
  • cloud-native data store
  • shared file storage - can be accessed by multiple computers at the same time
    • can be made available to VPC
      • EC2 instances can then securely mount EFS to store and access data
      • applications running on multiple EC2 instances can access the EFS at the same time
    • EFS can also be mounted on on-premises data center servers when connected to Amazon VPC with AWS Direct Connect or VPN making it easy to:
      • migrate data to EFS
      • enable cloud bursting
      • back up on-premises data to EFS
  • supports low latency applications and also highly-parallelized scale out jobs requiring high throughput (read here what's the difference between latency and throughput: File System Performance Metrics | My Public Notepad)
  • high throughput
    • throughput for a file system scales automatically as capacity grows
    • for workloads with high throughput and low capacity requirements, throughput can be provisioned independent of capacity 
  • there are 2 storage classes: 
    • Standard
    • EFS IA (Infrequent Access) - for less frequently accessed data we can configure EFS to store data in a cost-optimized IA storage class
      • LifeCycle Management automatically and transparently moves files access less frequently to EFS IA
  • has 2 performance modes so we can tailor EFS to our application needs
    • General Purpose
    • Max I/O

Benefits of using EFS

  • file storage system which is:
    • simple - supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4) protocol. This means that computers can access files on EFS by using standard file system tools and interfaces provided by OS. This is the reason why nfs is specified as the filesystem type supported by kernel when using mount command to mount EFS device on the EC2 instance (mount -t nfs ...).
    • serverless - no need to provision infrastructure
    • scalable performance - lifecycle management
    • elastic - automatically grow or shrink as we add/remove files
      • can grow to petabytes (PB)
  • fully managed - no need to manage it
  • easy to set up via AWS Management Console, API or CLI
    • "set and forget"
  • cost-effective data store: you pay for the storage you use
  • access data securely, via existing AWS security infrastructure (IAM)
EFS symbol

Drawbacks of EFS

  • supports Linux only (it doesn't support Windows)


When to use EFS?

  • when thousands of EC2 instances from multiple availability zones or on-premises servers need concurrently to access data
    • EFS provides concurrent access for tens of thousands of connections for EC2 instances, containers and lambda functions
  • designed for high availability and durability, for storing data redundantly across multiple (3) availability zones
  • ideal for machine learning, analytics, web serving, content management, media storage, DB backups

How to create EFS?

In AWS Console, go to EFS and click on Create file system.

 We can then set:
  • Name of our file system
  • VPC where we want EC2 instances to connect to our file system
  • Storage class [EFS storage classes - Amazon Elastic File System]
    • Standard (AWS used to name this Regional) - Stores data redundantly across multiple AZs (recommended)
    • One Zone - Stores data redundantly within a single AZ
      • we need to select desired availability zone



We can customize File system settings:


Note that by default Lifecycle management sets that files that haven't been access for 30 days will automatically be transferred from Standard to Standard-Infrequent Access storage (which is cheaper and so this is cost-effective measure).

We can then customize Network access:

Note that EFS is an entity connected to a network. EFS has assigned IP address in each availability zone. It is a mount target that provides an IP address for an NFSv4 endpoint at which we can mount an Amazon EFS file system.
So mount target provides a network interface (in the selected subnet in the AZ) for EFS mounted at it. 
When mount target state is available, our EFS system is mounted onto mount target and can be referred to via its url (or IP address). 

This does not mean it can still be accessible from EC2 instances. We need to mount EFS onto EC2. For that we need to specify a mount point (the local directory on the client where the EFS file system is mounted & accessible). This is one of settings that can be set when launching EC2 from AWS Console (the right-hand value in File system setting).

We can create a security group for EFS and use it everywhere - for each subnet/AZ. This security group can allow e.g. TCP traffic from anywhere.

Finally, we can customize File system policy:


Once EFS is created, it will take some more time for network interfaces to be created.


How to mount EFS on EC2 instance?

When creating EC2, we can select our EFS when setting File systems:

We also need to add EFS security group to the list of security groups used by this EC2 instance.

Once our EC2 instance is up and running, we can SSH to it and check mounted file systems with df tool:
$ df -T -h
-T - display file system types (Type column)
-h - display information about disk drives in human-readable format (kilobytes, megabytes, gigabytes and so on)



No comments: