Wednesday 4 May 2022

Terraform State


 

If we execute terraform apply for the first time, TF will create a resource and assign to it a unique id. It also creates a file terraform.tfstate in the configuration directory. This file is a JSON file which maps real resources into their definitions in the configuration files. It contains all details about the resource (name, type, all its attributes and their values, resource id etc...). It is not possible to disable creating this state file. State is a non-optional feature in Terraform and it is a single source of truth about resources deployed.

---

terraform show displays a state or plan file. The general form of command is:
 
terraform show [options] [file]
 
If path to TF plan file is specified we can inspect a plan to ensure that the planned operations are expected. 
 
If path to TF state file specified we can inspect the current state as Terraform sees it.
 
If no file path is specified, the latest state snapshot - resource attributes from the state file in a human-readable format will be shown. 
 
---
 
If terraform apply hasn't been executed at all terraform show doesn't have where to read the values:

$ terraform show
No state.

If we execute terraform apply command again, TF creates a new state, compares it to the persisted state and  detects that resource has already been provisioned and will perform no action.  

If we change resource configuration and execute terraform plan, TF will create a new state in memory only, compare it to the persisted state and show the differences, what will be changed. The next terraform apply will delete the previous resource, create completely new resource, with new id and then persist this new state in .tfstate file. Configuration and state file are now in sync.

State file also tracks dependencies between resources: there is a "dependencies" attribute of type JSON array in JSON in .tfstate. Resources that are not dependent can be created in parallel, at the same time. If ResourceA depends on ResourceB then TF knows that it needs first to created ResourceB. If we remove both ResourceA and ResourceB from the configuration file and execute terraform apply TF uses information about their dependency written in .tfstate file in order to determine which resource to delete first (ResourceA then ResourceB in this case).

By default, terraform plan (and terraform apply) refreshes Terraform state (state of the infrastructure) before checking for configuration changes. This ensures that any changes that have happened out-of-band (e.g. manually, outside the usual TF workflow) are detected and that infrastructure converges back to what is defined in code. 

If we are certain that there were no changes outside the TF workflow, we can instruct TF to skip this preemptive refresh of the state by specifying -refresh=false:

$ terraform plan -refresh=false

New Terraform Planning Options: -refresh=false, -refresh-only, -replace

Every team member should have the latest .tfstate (state data) before running TF commands and should make sure that no one else runs TF commands at the same time otherwise an unpredictable errors could occur. .tfstate should be saved in remote store (AWS, Google Cloud Storage, HashiCorp Consul, Terraform Cloud) and not locally. This allows state to be shared among team members. 
 
Here is a state file sharing scenario which involves Git repository:
 
Developer A writes configuration file and performs terraform apply which creates a local state file and provisions desired resources. They then push the config and state files to Git repository. Developer B, pulls these files, edits config file, performs terraform apply which changes the infrastructure and updates state file. Developer B then pushes changed config and state files to Git repository. This all works fine as long as these developers are not performing terraform apply at the same time on the same config and state files which could bring the infrastructure into undefined state. On the local machine Terraform actually has a state locking feature where it prevents executing terraform apply at the same time (e.g. from two terminals). This prevents a corruption of state files by two concurrent state changing operation. Terraform does not have a mechanism to prevent this scenario when terraform apply is executed from two different machines. If developer B forgets to pull the latest config and state files and works on obsolete files, they can end up unintentionally removing or changing the infrastructure.
 
For these reasons state file should not be saved in Git repository but in a secure shared storage at remote backends (AWS S3, GCS, Terraform Cloud, Hashicorp Consul). These storage solutions provide an option of file locking (state locking). TF automatically loads the state file from the remote storage and uploads it back after it gets changed. They are also secure so data in transport goes through encrypted channel.
 
terraform.tfstate contains values of all attributes that belong to resources. This also includes data considered sensitive (DB passwords, private IPs etc...). This means .tfstate files must be stored at secure storage. 

TF configuration files (.tf) can be stored in repositories like GitHub or BitBucket but state files (.tfstate) must be stored in secure remote backend systems (AWS, Google Cloud Storage etc...).

State files should never be manually modified. Should we need to modify it, it should be done via terraform state command.
 
To prevent accidental file modifications when we only need to check out the values in the file, we should not use text editors to view the file like

$ vi terraform.tfstate

...but we should use terraform state command:

$ terraform state show local_file.foo

The general syntax of this command is:

terraform state <subcommand> [options] [args]
 
Subcommands:
  • list - list resource(s)
    • terraform state list [options] [address]
  • mv - used to move items in the state file
    • terraform state mv [options] SOURCE DESTINATION
    • items can be moved within the same state file (which is effectively a resource renaming) or from one to another state file (in another configuration directory)
  • pull - to view the content of the remote state
    • terraform state pull [options] SOURCE DESTINATION
  • rm - to remove items from the state file
    • terraform state rm ADDRESS
  • show - show attributes of resource(s)
    • terraform state show [options] [address]
 
To list all the resources (just their names e.g. local_file.foo):
 
$ terraform state list
 
To list only a specific resource (which can be used to verify that this resource is present in the state file) we can pass the resource address/name (in form resource_type.resource_name):

$ terraform state list local_file.foo
 
To  get the detailed information (attributes) about the specific resource:
 
$ terraform state show local_file.foo

To rename a resource without recreating it:

$ terraform state mv local_file.foo local_file.bar  
 
This needs to be followed by manually renaming the resource in the configuration file!
 
This also applies to e.g. DynamoDB tables. To rename one without re-creating it:
  
$ terraform state mv aws_dynamodb_table.foo aws_dynamodb_table.bar  
 
(This needs to be followed by manually renaming the resource in the configuration file!) 

To download and display the remote state (all items and their attributes):
 
$ terraform state pull 
 
As the output is in JSON format, we can use a tool like jq to further process it and e.g. filter out information that we are interested in.  
 
$ terraform state pull | jq '.resources[] | select(.name = "foo") | .instances[].attributes.hash_key'
 
To remove some resource from TF management (but not to destroy it!):
 
$ terraform state rm local_file.foo
 
 
terraform refresh command syncs TF with the actual, real world infrastructure. If there was any manual change in the infrastructure, made outside Terraform (like manual update), this command will pick it up and update the state file. This command is run automatically within commands terraform plan and apply, prior to creating an execution plan. Using -refresh=false with these commands, prevents this.

Benefits of the (local) state file:

  • mapping TF configuration to the actual (real world) infrastructure
  • tracking metadata 
    • e.g. dependencies. This allows TF to create and destroy dependencies in the correct order.
  • improving the performance of TF operations when working with large config files
  • allows teams to collaborate

Drawbacks of the (local) state file:

  • configuration might be referring to resource available only on local dev machines 
  • they might contain sensitive data so can't be checked in public Git repository
  • prevents collaborative work


Destroying Resources


$ terraform state list

module.nginx-eks-karpenter.data.aws_caller_identity.account

$ terraform destroy -target module.nginx-eks-karpenter.data.aws_caller_identity.account

 

Mutable and Immutable Infrastructure

 

If we configure and provision a local_file resource and then change permissions on it and re-execute terraform apply, we'll see from the output that TF first deletes the previously created file and then creates completely new file, with new permissions. This is an example of immutable infrastructure. Terraform uses this approach for all resources. 

An example of mutable (editable, changeable) infrastructure can be e.g. in-place update of the pool of NGINX servers (which exist for high availability). This is done manually, on each machine, during the maintenance window. This is called in-place update because the underlying infrastructure remains the same but the software and configuration get changed. The problem here might occur when upgrade of one or more servers fails (e.g. disk full, networking issues, incompatible OS versions etc...) in which case our pool of servers will have configuration drift: web servers will have different NGINX versions, OS versions, configurations...which would make their maintenance and issue fixing difficult.

The better approach is provisioning an immutable (non-changeable) infrastructure: spin up new web servers with upgraded versions and then remove old web servers. 


LifeCycle Rules


As mentioned above, if we edit resource configuration and re-execute terraform apply, TF will first delete previously created resource and then create completely new resource, with new attributes. What if we want TF first to create a new resource and then remove the old one? Or if we don't want old resource to be deleted at all?


To change default behaviour we can use lifecycle rules. They are specified in a lifecycle block. lifecycle is a meta-argument (like depends_on for example).

create_before_destroy = true instruct TF to create a new resource before destroying the old one

prevent_destroy = true prevents any changes to take place that would result in destroying the existing resource.


resource "local_file" "foo" {
    filename = "${path.cwd}/temp/foo.txt"
    content = "This is a text content of the foo file!"
    # file_permission = "0700"
    file_permission = "0777"
    lifecycle {
        # create_before_destroy = true
        prevent_destroy = true
    }
}

If we have prevent_destroy set to true and perform the following sequence of actions:

terraform init
terraform apply
modify file_permission ("0700" -> "0777")
terraform apply

...we'll get the following error:

% terraform apply 
local_file.foo: Refreshing state... [id=db5ca40b5588d44e9ec6c1b4005e11a6fd0c910e]
Error: Instance cannot be destroyed
│ 
│   on main.tf line 1:
│    1: resource "local_file" "foo" {
│ 
│ Resource local_file.foo has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed. To avoid this
│ error and continue with the plan, either disable lifecycle.prevent_destroy or reduce the scope of the plan using the -target flag.

This is useful when dealing with resources that must not be accidentally deleted like DB instances. 

terraform destroy can still destroy the resource though. prevent_destroy lifecycle rule prevents destruction that are caused by resource configuration changes and subsequent terraform apply command.

ignore_changes - prevents the resource being updated if any of the attribute from the given list gets updated:

If we have:

resource "local_file" "foo" {
    filename = "${path.cwd}/temp/foo.txt"
    content = "This is a text content of the foo file!"
    file_permission = "0777"

    lifecycle {
        ignore_changes = [
            content, file_permission
        ]
    }
}

...and execute terraform apply, Terraform will create the file:


  # local_file.foo will be created
  + resource "local_file" "foo" {
      + content             = "This is a text content of the foo file!"
      + directory_permission = "0777"
      + file_permission      = "0777"
      + filename             = "/path/to/temp/foo.txt"
      + id                   = (known after apply)
    }

But if we then change content attribute to:

content = "Use this to test ignore_changes rule"

...and execute terraform apply again, no changes in the resource would happen as this attribute is listed within ignore_changes list:

% terraform apply
local_file.foo: Refreshing state... [id=db5ca40b5511d44e9ec6c1b4005e11a6fd0c910e]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

To ignore changes in any attribute use the following syntax:

ignore_changes = all


Data Sources


TF uses data sources to read attributes of resources which were provisioned outside the TF (e.g. manually). To define data source, a data block is used:

resource "local_file" "foo" {
    filename = "${path.cwd}/temp/foo.txt"
    content = data.local_file.my_data_source.content
}

# https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file
data "local_file" "my_data_source" {
    filename = "${path.cwd}/data_source.txt"
}

While resource infrastructure item (managed resource) can be created, updated and destroyed, data infrastructure item (data resource) can be only read from.

Example: reading the public IP address of EC2 instance provisioned manually 

data "aws_instance" "my-other-server" {
    instance-id = "i-0123456789"
}

output my-other-server-public-ip {
    value = data.aws_instance.my-other-server.public_ip
}

This EC2 instance my-other-server is not managed by TF.


Tainted Resources


If resource creation during execution of terraform apply fails (errors are reported) for any reason (e.g. local-exec provisioner fails), Terraform marks that resource as tainted. The next execution of terraform plan will show the message like:

aws_instance.my-web-server is tainted so must be replaced

Subsequent terraform apply will try to recreate this resource. 

Sometimes we want to force resource re-creation although there were no changes in its configuration in the TF script. For example: EC2 instance was manually changed by changing the Nginx version (outside the TF). To revert this change we have 2 options:
 
1) execute terraform destroy and then terraform apply 
 
2) better approach: manually mark this resource as tainted and then execute terraform apply: 

$ terraform taint aws_instance.my-web-server

To untaint the resource, we need to use the command of the same name:

$ terraform untaint aws_instance.my-web-server

After resource is untainted terraform apply will not recreate that instance.

As of Terraform v0.15.2 taint and untain commands are DEPRECATED. Using terraform apply -replace=ADDRESS is recommended.
 
$ terraform apply -replace="aws_instance.my-web-server"

This replacement will be reflected in the TF plan. 


See also:

 
---

No comments: