Tuesday 30 April 2024

VS Code Extension for YAML & Kubernetes

In VS Code's Extensions Marketplace, search for YAML and install YAML by Red Hat. 

The following steps show how to enable support for Kubernetes.

Upon installation, on extension's tab page, click on the cog and choose Extension Settings.

In settings: find Yaml: Schemas and click on Edit in settings.json

... and in that file add the following:

    "yaml.schemas": {  
        "kubernetes": "*.yaml"
    }



After this, in YAML file, type in apiVersion (the first required property of the Kubernetes YAML definition file) and auto-completion and auto-indentation for Kubernetes YAML files will kick in:

In case of errors in YAML formatting or Kubernetes YAML file not matching the Kubernetes YAML schema, this extension will show an error:





Monday 29 April 2024

YAML Ain't Markup Language (YAML)




YAML file format is used to represent data, like other data structure formats like XML or JSON.

This table shows the same data, represented in all three formats:


Files that use YAML format can have extension .yaml or .yml

Key-Value Pair


Data in its simplest form is a key value pair and that's how it's defined in YAML: key and value separated by a colon (colon must be followed by space).

name: Server1

The key above is name
The value is Server1.


YAML file is basically a collection of one or more key-value pairs where key is a string (without quotation marks) while value is a string or some more complex data structure like a list, dictionary or list of dictionaries etc...


Array/List


Array name is followed by colon and then each item goes in its own line with a dash in the front:


Servers:
- Server1
- Server2
...
- ServerN


The dash indicates that it's an element of an array.


In the example above we have actually a key-value pair where value is a list. As YAML document is a collection of key-value pairs, we can have a file like this:

foo.yaml:

Servers:
- Server1
- Server2

DataCentres:
- London
- Frankfurt

Dictionary (Map)


A dictionary is a set of properties grouped together under an item.

Technically, the example below is a key-value pair where key is the name of the dictionary and value is the dictionary itself:

Server1:
    name: Server1
    owner: John
    created: 123456
    status: active

Notice the blank space before each property. There must be an equal number of blank spaces before the properties of a single item so they are all aligned together, meaning that they are all siblings of their parent, which is key Server1.

The number of (indentation) spaces before these properties doesn't matter. But that number should be the same as they are siblings.

A YAML file use spaces as indentation, you can use 2 or 4 spaces for indentation, but no tab. In other words, tab indentation is forbidden. [php - A YAML file cannot contain tabs as indentation - Stack Overflow]

Why does YAML forbid tabs?

Tabs have been outlawed since they are treated differently by different editors and tools. And since indentation is so critical to proper interpretation of YAML, this issue is just too tricky to even attempt. Indeed Guido van Rossum of Python has acknowledged that allowing TABs in Python source is a headache for many people and that were he to design Python again, he would forbid them. [YAML Ain't Markup Language]

Notice the number of spaces before each property that indicates these key value pairs fall within Server1

Let's suppose that we have:

Server1:
    name: Server1
    owner: John
       created: 123456
    status: active

In this case created has more spaces on the left than owner and so it is now a child of the owner property instead being its sibling, which is incorrect.

Also these properties must have more spaces than its parent which is Server1.

What if we had extra spaces for created and status?  

Server1:
    name: Server1
    owner: John
       created: 123456
       status: active

Then they will fall under owner and thus become properties of owner.  This will result in a syntax error which will tell you that mapping values are not allowed here because owner already has a value set which is John

For a value of the key-value pair we can either set a direct value or a hash map. We cannot have both. So the number of spaces before each property is key in YAML.


Complex Data Types


A list containing dictionaries 


We have here a key-value pair where value is a list of key-value pairs where value is a dictionary (we can say that we have a list of dictionaries where each dictionary has a name):

Servers:
- Server1:
    name: Server1
    owner: John
    created: 123456
    status: active
- Server2:
    name: Server2
    owner: Jack
    created: 789012
    status: shutdown


We have here a list of servers and the elements of the list are key-value pairs Server1 and Server2.
Their values are dictionaries containing server information.

We can have a list of (unnamed) dictionaries where each element of list is not a key-value pair but a dictionary itself:

 Servers:
  - name: Server1
    owner: John
    created: 123456
    status: active
  - name: Server2
    owner: Jack
    created: 789012
    status: shutdown

A list containing dictionaries containing list


Servers:
- Server1:
    name: Server1
    owner: John
    created: 123456
    status: active
    applications:
       - web server
       - authentication database
- Server2:
    name: Server2
    owner: Jack
    created: 789012
    status: shutdown
    applications:
       - caching database

When to use a list, dictionary and list of dictionaries?


Use dictionary if need to represent information or multiple properties of a single object.

Dictionary is a collection of key-value pairs grouped together:

name: Server1
owner: John
created: 123456
status: active

In case we need to split the owner further into name and surname, we could then represent this as a dictionary within another dictionary.

name: Server1
owner: 
   name: John
   surname: Smith
created: 123456
status: active

In this case the single value of owner is now replaced by a small dictionary with two properties name
and surname. So this is a dictionary within another dictionary.


Use a list/array to represent multiple items of the same type of object.  
E.g. that type could be a string.

We have here a key-value pair where value is a list of strings

Servers:
- Server1
- Server2

What if we would like to store all information about each server? We'll expand each item in the array and replace the name with the dictionary. This way we are able to represent all information about multiple servers in a single YAML file using a list of dictionaries.

We have here a key-value pair where value is a list of dictionaries:

Servers:
- name: Server1
  owner: John
  created: 123456
  status: active
- name: Server2
  owner: Jack
  created: 789012
  status: shutdown


When the order of items matter?



Dictionary is an unordered collection.
Lists/arrays are ordered collections.

Dictionary 

name: Server1
owner: John

is the same as:

owner: John
name: Server1

But list:

- Server1
- Server2

is not the same as list:

- Server2
- Server1


Comments in YAML

Any line beginning with a hash is automatically ignored and considered as a comment.

# List of servers
- Server1
- Server2

References:


Sunday 28 April 2024

Introduction to Kubernetes

These are custom notes that extend my notes from an Udemy course "Kubernetes for the Absolute Beginners - Hands-on". All course content rights belong to course creators. 

Introduction

Kubernetes (k8s) is:
  • Platform for managing application containers (containerised applications, container-oriented applications) across multiple hosts (one or more host clusters)
  • Container orchestration technology -  system for automating the operations and management of application containers in complex, multi-container workloads:
    • Container creation
    • Container deployment
    • Rolling deployment *)
    • Auto-scaling
    • Load balancing
    • Container health monitoring
    • Compute resource management
    • Volume management
    • Persistent storage
    • Networking
    • High availability by cluster federation
  • Open-source
  • Originally designed by Google, based upon their running of containers in production. Now maintained by the Cloud Native Computing Foundation
  • Supports hosting enhanced and complex applications on various kinds of architectures; it is designed to run anywhere:
    • on a bare metal
    • in our data center
    • on the public cloud - supported on any cloud platform
    • on the hybrid cloud
*) Rolling deployment:
  • A deployment strategy that slowly replaces previous versions of an application with new versions of an application by completely replacing the infrastructure on which the application is running;
  • It is renowned for its ability to update applications without downtime. Incrementally updating nodes or replicas ensures that the service remains available to users throughout the deployment process)
  • Rolling deployments use the concept of a window size—this is the number of servers that are updated at any given time. For example, if a Kubernetes cluster is running 10 instances of an application (10 pods), and you want to update two of them at a time, you can perform a rolling deployment with a window size of 2.

Containers: Docker overview

Let's say we need to deploy a stack of various technologies: 
  • web server Node.js Express
  • MongoDB
  • Redis messaging system
  • Ansible as orchestration tool
Each of these components needs to be compatible with running host's hardware, OS and installed dependencies and libraries. But this is usually not the case. This is therefore named Matrix from hell.

Docker helps preventing these dependency issues. E.g. we can run each of these components in its own container, which contains libraries and dependencies that the component is compatible with. Docker runs on top of the OS (Win, Mac, Linux etc).

Containers are completely isolated environments. They have their own processes, network interfaces, mounts...just like virtual machines except they all share the same OS kernel (which is interfacing the hardware).

Docker adds an abstraction layer over LXC (LinuX Containers). Docker is like an extension of LXC. [LXC vs Docker: Why Docker is Better | UpGuard]

Ubuntu, Fedora, SUSE and CentOS share the same OS kernel (Linux) but have different software (GUI, drivers, compilers, file systems, ...) above it. This custom software differentiates OSes between each other.

Docker containers share the underlying OS kernel. For example, Docker on Ubuntu can run any flavour of Linux which runs on the same Linux kernel as Ubuntu. This is why we can't run Windows-based container on Docker running on Linux OS - they don't share the same kernel.
Hypervisor:
  • Abstracts away hardware for the virtual machines so they can run an operating system
  • Coordinates between the machine's physical hardware and virtual machines.
container engine (e.g. Docker Engine):
  • Abstracts away an operating system so containers can run applications
  • Coordinates between the operating system and (Docker) containers
  • Docker containers are process-isolated and don't require a hardware hypervisor. This means Docker containers are much smaller and require far fewer resources than a VM.
Unlike hypervisors, Docker is not meant to virtualize and run different operating systems and kernels on the same hardware.

The main purpose of Docker is to containerise applications, ship them and run them.

In case of Docker we have: 
  • Containers (one or more) containing:
    • Application
    • Libraries & Dependencies 
  • Docker
  • OS
  • Hardware
In case of virtual machine we have:
  • Virtual Machines (one or more) containing:
    • Application
    • Libraries & Dependencies
    • OS
  • Hypervisor
  • OS
  • Hardware
Docker uses less processing power, less disk space and has faster boot up time than VMs.

Docker containers share the same kernel while different VMs are completely isolated. We can run VM with Linux on the host with Windows.

Many companies release and ship their software products as Docker images, published on Docker Hub, public Docker registry.

We can run each application from the example above in its own container:

$ docker run nodejs 
docker run mongodb
docker run redis
docker run ansible

Docker image is a template, used to create one or more containers.

Containers are running instances of images that are isolated and have their own environments and set of processes.

Dockerfile describes the image.

Container Orchestration


Applications run in their own containers. 

What if one application depends on another e.g. web server, running in one container, depends on the DB running in another container? 
What if the number of users increases and we need to scale out our application
How to scale down when the load decreases?
How to build services across multiple machines without dealing with cumbersome network and storage settings? 
How to manage and roll out our microservices by different service cycle?

We should have an underlying platform that takes care of these dependencies and scaling. This process of deploying and managing containers is called container orchestration.

Container orchestration technologies:
  • Docker Swarm
    • easy to set up
    • lacks advanced features
  • Kubernetes (Google)
    • most popular
    • difficult to set up
    • has lots of options to support deployments of complex architecture setups
    • supported on all main public cloud service providers like GCP, Azure, AWS
  • Mesos (Apache)
    • difficult to set up
    • has advanced features
Kubernetes advantages:
  • Used to deploy and manage hundreds or thousands of containers in a clustered environment
  • Kubernetes is designed with high availability (HA). We have multiple instances of our application running on different nodes so hardware failures on some nodes won't impact the availability. We are able to create multiple master nodes from preventing single point of failure. 
  • Traffic is load balanced across multiple containers.
  • Scaling is done by scaling the number of containers running on a single host but also increasing the number of hosts (hardware scaling) if processing demands reach maximum thresholds on existing nodes.
  • The lifetime of containers might be short. They may be killed or stopped anytime when they exceed the limit of resource, how do we ensure our services always serve a certain number of containers? ReplicationController or ReplicaSet in Kubernetes will ensure a certain number of group of containers are up. 
  • Kubernetes even supports liveness probe to help you define your application health.
  • For better resource management, we can also define the maximum capacity on Kubernetes nodes and the resource limit for each group of containers (a.k.a pod). Kubernetes scheduler will then select a node that fulfills the resource criteria to run the containers. 
  • Kubernetes provides an optional horizontal pod auto-scaling feature. With this feature, we could scale a pod horizontally by resource or custom metrics.
  • Perfect match for microservices where it helps their CD (Continuous Delivery). We can create a Deployment to rollout, rollover, or roll back selected containers. 
  • Containers are considered as ephemeral - they can quickly and/or often die. We can mount the volume into a container to preserve the data in a single host world. In the cluster world, a container might be scheduled to run on any host. Kubernetes Volumes and Persistent Volumes make the volume mounting work as permanent storage seamlessly.
  • This is all achieved with the set of declarative object configuration files.


Kubernetes Architecture

  • Node (worker node, minion)
    • machine, physical or virtual, on which Kubernetes is installed
    • worker machine on which containers will be launched by Kubernetes; workers run containers
    • if node fails, our application will go down => we need to have more nodes
  • Cluster
    • Set of nodes grouped together
    • Even if one node fails, application is still accessible from other nodes
    • Having multiple nodes also helps sharing the load
    • Kubernetes cluster consists of two types of nodes, master nodes and worker nodes. 
  • Master (master node)
    • responsible for managing the cluster
    • controls and schedules all activities in the cluster
    • stores the information about all members of the cluster
    • monitors nodes
    • when node fails, moves workload of the failed node to other worker nodes
    • Master is a node with Kubernetes installed on it and is configured as a master node
    • Master watches over the nodes in the cluster and is responsible for orchestration of containers on the worker nodes
    • Master nodes host the K8s control plane components. The master node will hold configuration and state data used to maintain the desired state.

When we install Kubernetes on the host, we install multiple components on it.

There are two types of nodes/servers: master and worker. And there is a set of components that make up Kubernetes. How are these components distributed across different types of servers? How does one server become a master and the other the slave? 

master (controller) server (node):
  • API Server (kube-api-server)
    • this is what makes node a master
    • acts as the front end of Kubernetes
    • users, management devices, command line interfaces talk to it in order to interact with Kubernetes cluster 
  • etcd service
    • All the information gathered are stored in a key value store based on the popular etcd framework
    • name is the abbreviation of Experimental Distributed Tracing Service (?) 
    • key store
    • distributed reliable key-value store used by Kubernetes to store all data used to manage the cluster
    • when we have multiple nodes and multiple masters in the cluster, etcd stores all that information on all the nodes in the cluster in the distributed manner
    • responsible for implementing locks within the cluster to ensure there are no conflicts between the masters
  • controller
    • control manager 
    • brain behind the orchestration
    • responsible for noticing and responding when nodes, containers or endpoints go down
    • make decisions to bring up new containers in such cases
  • scheduler
    • responsible for distributing work of containers across multiple nodes
    • it looks for newly created containers and assigns them to nodes
Master components
(credit: DevOps with Kubernetes by Hideto Saito, Hui-Chuan Chloe Lee and Cheng-Yang Wu)

(I/F = Interface)


This article describes well the control plane of the master node:


API Server and its clients
(image credit: Rini Thomas; source: https://medium.com/@rinithomas/the-kubernetes-api-server-430a39aec2d7)

All communications and operations between the control plane components and external clients, such as kubectl, are translated into RESTful API calls that are handled by the API server. 
Effectively, the API server is a RESTful web application that processes RESTful API calls over HTTP to store and update API objects in the etcd datastore.   
Control Plane on the master/controller node(s) consists of the API server, controller manager, and scheduler.  
API server is the central management entity and the only component that talks directly with the distributed storage component etcd. 
 API server has the following core responsibilities:
  • To serve the Kubernetes APIThis API is used :
    • cluster-internally by the:
      • master components 
      • worker nodes
      • our Kubernetes-native apps
    • externally by clients such as kubectl
  • To proxy cluster components, such as the Kubernetes dashboard, or to stream logs, service ports, or serve kubectl exec sessions.  
Serving the API means:
  • Reading state: getting single objects, listing them, and streaming changes
  • Manipulating state: creating, updating, and deleting objects.  
kubectl command is translated into an HTTP API request in JSON format and is sent to the API server. Then, the API server returns a response to the client, along with any requested information.  
API server is stateless (that is, its behavior will be consistent regardless of the state of the cluster) and is designed to scale horizontally. Usually, for the high availability of clusters, it is recommended to have at least three instances to handle the load and fault tolerance better.  

 

API Server internal processes 
(image credit: Rini Thomas; source: https://medium.com/@rinithomas/the-kubernetes-api-server-430a39aec2d7)

 

worker node (minion):
  • is where the containers are hosted e.g. Docker containers. 
  • kubelet service (agent)
    • the agent that runs on each node in the cluster
    • interacts with a master to provide health information of the worker node and carry out actions requested by the master on the worker nodes
    • makes sure that containers are running as expected
  • Container Runtime 
    • underlying software required for running containers on a system
    • Container Runtime can be be Docker, rkt or CRI-O
    • in our case it's Docker but there are other options as well 

Node components 
(credit: DevOps with Kubernetes by Hideto Saito, Hui-Chuan Chloe Lee and Cheng-Yang Wu)



Understanding what components constitute the master and worker nodes will help us install and configure the right components on different systems when we set up our infrastructure. 

kubectl:
  • command line (CLI) tool for Kubernetes
  • command line utility known as the kube command line tool or kubectl or kube control 
  • kubectl tool is used to:
    • interact with the Kubernetes cluster(s)
    • enables the interaction (to run commands against) the clusters in order to manage and inspect them
    • create pods, services and other components
    • deploy and manage applications on a Kubernetes cluster
      • kubectl run command is used to deploy an application on the cluster
      • example: kubectl run hello-minikube
    • inspect and manage cluster resources e.g. get cluster information
      • kubectl cluster-info command is used to view information about the cluster
    • get the status of other nodes in the cluster
      • kubectl get nodes command is used to list all the nodes part of the cluster
    • view logs
    • manage many other things

$ kubectl --help 
kubectl controls the Kubernetes cluster manager.

 Find more information at: https://kubernetes.io/docs/reference/kubectl/

Basic Commands (Beginner):
  create          Create a resource from a file or from stdin
  expose          Take a replication controller, service, deployment or pod and expose it as a new Kubernetes service
  run             Run a particular image on the cluster
  set             Set specific features on objects

Basic Commands (Intermediate):
  explain         Get documentation for a resource
  get             Display one or many resources
  edit            Edit a resource on the server
  delete          Delete resources by file names, stdin, resources and names, or by resources and label selector

Deploy Commands:
  rollout         Manage the rollout of a resource
  scale           Set a new size for a deployment, replica set, or replication controller
  autoscale       Auto-scale a deployment, replica set, stateful set, or replication controller

Cluster Management Commands:
  certificate     Modify certificate resources
  cluster-info    Display cluster information
  top             Display resource (CPU/memory) usage
  cordon          Mark node as unschedulable
  uncordon        Mark node as schedulable
  drain           Drain node in preparation for maintenance
  taint           Update the taints on one or more nodes

Troubleshooting and Debugging Commands:
  describe        Show details of a specific resource or group of resources
  logs            Print the logs for a container in a pod
  attach          Attach to a running container
  exec            Execute a command in a container
  port-forward    Forward one or more local ports to a pod
  proxy           Run a proxy to the Kubernetes API server
  cp              Copy files and directories to and from containers
  auth            Inspect authorization
  debug           Create debugging sessions for troubleshooting workloads and nodes
  events          List events

Advanced Commands:
  diff            Diff the live version against a would-be applied version
  apply           Apply a configuration to a resource by file name or stdin
  patch           Update fields of a resource
  replace         Replace a resource by file name or stdin
  wait            Experimental: Wait for a specific condition on one or many resources
  kustomize       Build a kustomization target from a directory or URL

Settings Commands:
  label           Update the labels on a resource
  annotate        Update the annotations on a resource
  completion      Output shell completion code for the specified shell (bash, zsh, fish, or powershell)

Subcommands provided by plugins:

Other Commands:
  api-resources   Print the supported API resources on the server
  api-versions    Print the supported API versions on the server, in the form of "group/version"
  config          Modify kubeconfig files
  plugin          Provides utilities for interacting with plugins
  version         Print the client and server version information

Usage:
  kubectl [flags] [options]

Use "kubectl <command> --help" for more information about a given command.
Use "kubectl options" for a list of global command-line options (applies to all commands).


Quiz:

What is a worker machine in Kubernetes known as?
A Node in Kubernetes can only be a physical machine and can never be a virtual machine.
Multiple nodes together form what?
Which of the following processes runs on Kubernetes Master Node?
Which of the following is a distributed reliable key-value store used by kubernetes to store all data used to manage the cluster?
Which of the following services is responsible for distributing work or containers across multiple nodes.  
Which of the following is the underlying framework that is responsible for running application in containers like Docker?
Which is the command line utility used to manage a kubernetes cluster?


Pods

  • Let's assume that the following have been set up already:
    • The application is already developed and built into Docker images and it is available on a Docker repository like Docker Hub so Kubernetes can pull it down
    • Kubernetes cluster has already been set up and is working
      • this could be a single node setup or a multi node setup
      • all the services need to be in a running state
  • With Kubernetes, our ultimate aim is to deploy our application in the form of containers on a set of machines that are configured as worker nodes in a cluster
  • Kubernetes does not deploy containers directly on the worker nodes. The containers are encapsulated into a Kubernetes object known as pods.
  • A pod is:
    • a single instance of an application
    • the smallest object that you can create in Kubernetes
    • the most basic and the smallest unit in Kubernetes
  • The simplest case: a single node Kubernetes cluster with a single instance of the application running in a single Docker container encapsulated in a pod
  • What if the number of users accessing your application increase and you need to scale your application? 
    • We need to add additional instances of your web application to share the load.
  • Where would we spin up additional instances? 
    • We don't bring up new container instance within the same pod. 
    • We create new pod altogether with a new instance of the same application. 
    • We now have two instances of our web application running on two separate pods on the same Kubernetes system or node.
  • What if the user base further increases and your current node has no sufficient capacity? 
    • We can always deploy additional pods on a new node in the cluster
    • We will have a new node added to the cluster to expand the cluster's physical capacity
  • Pods usually have a 1 to 1 relationship with containers running our application. 
    • To scale up, we create new pods
    • To scale down we delete existing pods
    • We do NOT add additional containers to an existing pod to scale our application. 
  • We can also achieve load balancing between the containers.

Multi-container pods

Pods usually have a 1 to 1 relationship with the containers, but we are NOT restricted to having a single container in a single pod. A single pod can have multiple containers except for the fact that they're usually not multiple containers of the same kind.

To scale our application, we would need to create additional pods.

Sometimes we might have a scenario where we have a helper container that might be doing some kind of supporting task for our web application, such as processing a user entered data, processing a file uploaded by the user etc and we want these helper containers to live alongside our application container. In that case, we can have both of these containers part of the same pod so that:
  • when a new application container is created, the helper is also created
  • when it dies, the helper also dies since they are part of the same pod
  • The two containers can also communicate with each other directly by referring to each other as local host since they share the same network space
  • They can easily share the same storage space as well

Why do we need Kubernetes?

  • Let's for a moment keep Kubernetes out of our discussion and talk about simple Docker containers. Let's assume we were developing a process or a script to deploy our application on a Docker host. Then we would first simply deploy our application using a simple docker run command, and the application runs fine and our users are able to access it:
    • docker run python-app
  • When the load increases, we deploy more instances of our application by running the docker run commands many more times: 
    • docker run python-app --name app1
    • docker run python-app --name app2
    • docker run python-app --name app3
    • docker run python-app --name app4
  • Sometime in the future our application is further developed, undergoes architectural changes and grows and gets complex. We now have a new helper container that helps our web application by processing or fetching data from elsewhere (NOTE: --link is a legacy option for docker run; it is recommend using user-defined networks to facilitate communication between two containers instead of using --link; see Legacy container links | Docker Docs):
    • docker run helper --link app1
    • docker run helper --link app2
    • docker run helper --link app3
    • docker run helper --link app4
  • These helper containers maintain a 1 to 1 relationship with our application container and thus needs to communicate with the application containers directly and access data from those containers. For this, we need to (manually):
    • maintain a map of what app and helper containers are connected to each other
    • establish network connectivity between these containers ourselves using links and custom networks
    • create shareable volumes and share it among the containers. We would need to maintain a map of that as well. 
    • monitor the state of the application container
      • When it dies, manually kill the helper container as well as it's no longer required.
      • When a new container is deployed, we would need to deploy the new helper container as well with pods.
  • Kubernetes does all of this for us automatically. We just need to define what containers a pod consists of and the containers in a pod by default will have access to the same storage, the same network namespace and same fate as in they will be created together and destroyed together.
  • Even if our application didn't happen to be so complex and we could live with a single container, Kubernetes still requires you to create pods, but this is good in the long run as your application is now equipped for architectural changes and scale in the future.
  • However, multi-containers pods are a rare use case. Single containers per pod is the most common use case.

My observation: Kubernetes pods seem to be doing a job very similar to docker-compose. What are the similarities and what are the differences between two?



How to deploy/create pods?

kubectl run command 
  • e.g. kubectl run nginx
  • deploys a Docker container by creating a pod named nginx
    • it first creates a pod automatically
    • then deploys an instance of the Nginx Docker image
      • we need to specify the application image name using the image parameter:
      • kubectl run nginx --image nginx
      • The application image, in this case, the nginx image is downloaded from the Docker Hub Repository. Docker Hub is a public repository where latest Docker images of various applications are stored.
      • We can configure Kubernetes to pull the image from the public Docker hub or a private repository within the organization.
      • in the current state, we haven't made the web server accessible to external users but we can access it internally from the node

kubectl run --help
Create and run a particular image in a pod.

Examples:
  # Start a nginx pod
  kubectl run nginx --image=nginx
  
  # Start a hazelcast pod and let the container expose port 5701
  kubectl run hazelcast --image=hazelcast/hazelcast --port=5701
  
  # Start a hazelcast pod and set environment variables "DNS_DOMAIN=cluster" and
"POD_NAMESPACE=default" in the container
  kubectl run hazelcast --image=hazelcast/hazelcast --env="DNS_DOMAIN=cluster"
--env="POD_NAMESPACE=default"
  
  # Start a hazelcast pod and set labels "app=hazelcast" and "env=prod" in the container
  kubectl run hazelcast --image=hazelcast/hazelcast --labels="app=hazelcast,env=prod"
  
  # Dry run; print the corresponding API objects without creating them
  kubectl run nginx --image=nginx --dry-run=client
  
  # Start a nginx pod, but overload the spec with a partial set of values parsed from JSON
  kubectl run nginx --image=nginx --overrides='{ "apiVersion": "v1", "spec": { ... } }'
  
  # Start a busybox pod and keep it in the foreground, don't restart it if it exits
  kubectl run -i -t busybox --image=busybox --restart=Never
  
  # Start the nginx pod using the default command, but use custom arguments (arg1 .. argN) for that
command
  kubectl run nginx --image=nginx -- <arg1> <arg2> ... <argN>
  
  # Start the nginx pod using a different command and custom arguments
  kubectl run nginx --image=nginx --command -- <cmd> <arg1> ... <argN>

Options:
    --allow-missing-template-keys=true:
        If true, ignore any errors in templates when a field or map key is missing in the
        template. Only applies to golang and jsonpath output formats.

    --annotations=[]:
        Annotations to apply to the pod.

    --attach=false:
        If true, wait for the Pod to start running, and then attach to the Pod as if 'kubectl
        attach ...' were called.  Default false, unless '-i/--stdin' is set, in which case the
        default is true. With '--restart=Never' the exit code of the container process is
        returned.

    --command=false:
        If true and extra arguments are present, use them as the 'command' field in the container,
        rather than the 'args' field which is the default.

    --dry-run='none':
        Must be "none", "server", or "client". If client strategy, only print the object that
        would be sent, without sending it. If server strategy, submit server-side request without
        persisting the resource.

    --env=[]:
        Environment variables to set in the container.

    --expose=false:
        If true, create a ClusterIP service associated with the pod.  Requires `--port`.

    --field-manager='kubectl-run':
        Name of the manager used to track field ownership.

    --image='':
        The image for the container to run.

    --image-pull-policy='':
        The image pull policy for the container.  If left empty, this value will not be specified
        by the client and defaulted by the server.

    -l, --labels='':
        Comma separated labels to apply to the pod. Will override previous values.

    --leave-stdin-open=false:
        If the pod is started in interactive mode or with stdin, leave stdin open after the first
        attach completes. By default, stdin will be closed after the first attach completes.

    -o, --output='':
        Output format. One of: (json, yaml, name, go-template, go-template-file, template,
        templatefile, jsonpath, jsonpath-as-json, jsonpath-file).

    --override-type='merge':
        The method used to override the generated object: json, merge, or strategic.

    --overrides='':
        An inline JSON override for the generated object. If this is non-empty, it is used to
        override the generated object. Requires that the object supply a valid apiVersion field.

    --pod-running-timeout=1m0s:
        The length of time (like 5s, 2m, or 3h, higher than zero) to wait until at least one pod
        is running

    --port='':
        The port that this container exposes.

    --privileged=false:
        If true, run the container in privileged mode.

    -q, --quiet=false:
        If true, suppress prompt messages.

    --restart='Always':
        The restart policy for this Pod.  Legal values [Always, OnFailure, Never].

    --rm=false:
        If true, delete the pod after it exits.  Only valid when attaching to the container, e.g.
        with '--attach' or with '-i/--stdin'.

    --save-config=false:
        If true, the configuration of current object will be saved in its annotation. Otherwise,
        the annotation will be unchanged. This flag is useful when you want to perform kubectl
        apply on this object in the future.

    --show-managed-fields=false:
        If true, keep the managedFields when printing objects in JSON or YAML format.

    -i, --stdin=false:
        Keep stdin open on the container in the pod, even if nothing is attached.

    --template='':
        Template string or path to template file to use when -o=go-template, -o=go-template-file.
        The template format is golang templates
        [http://golang.org/pkg/text/template/#pkg-overview].

    -t, --tty=false:
        Allocate a TTY for the container in the pod.

Usage:
  kubectl run NAME --image=image [--env="key=value"] [--port=port] [--dry-run=server|client]
[--overrides=inline-json] [--command] -- [COMMAND] [args...] [options]

Use "kubectl options" for a list of global command-line options (applies to all commands).

How do we see the list of pods available?

kubectl get pods command:
  • lists all pods in our cluster
  • also shows their current state e.g. pod can be in ContainerCreating state and soon changes to a Running state when it is actually running
$ kubectl get --help
Display one or many resources.

 Prints a table of the most important information about the specified resources. You can filter the
list using a label selector and the --selector flag. If the desired resource type is namespaced you
will only see results in your current namespace unless you pass --all-namespaces.

 By specifying the output as 'template' and providing a Go template as the value of the --template
flag, you can filter the attributes of the fetched resources.

Use "kubectl api-resources" for a complete list of supported resources.

Examples:
  # List all pods in ps output format
  kubectl get pods
  
  # List all pods in ps output format with more information (such as node name)
  kubectl get pods -o wide
  
  # List a single replication controller with specified NAME in ps output format
  kubectl get replicationcontroller web
  
  # List deployments in JSON output format, in the "v1" version of the "apps" API group
  kubectl get deployments.v1.apps -o json
  
  # List a single pod in JSON output format
  kubectl get -o json pod web-pod-13je7
  
  # List a pod identified by type and name specified in "pod.yaml" in JSON output format
  kubectl get -f pod.yaml -o json
  
  # List resources from a directory with kustomization.yaml - e.g. dir/kustomization.yaml
  kubectl get -k dir/
  
  # Return only the phase value of the specified pod
  kubectl get -o template pod/web-pod-13je7 --template={{.status.phase}}
  
  # List resource information in custom columns
  kubectl get pod test-pod -o
custom-columns=CONTAINER:.spec.containers[0].name,IMAGE:.spec.containers[0].image
  
  # List all replication controllers and services together in ps output format
  kubectl get rc,services
  
  # List one or more resources by their type and names
  kubectl get rc/web service/frontend pods/web-pod-13je7
  
  # List the 'status' subresource for a single pod
  kubectl get pod web-pod-13je7 --subresource status

Options:
    -A, --all-namespaces=false:
        If present, list the requested object(s) across all namespaces. Namespace in current
        context is ignored even if specified with --namespace.

    --allow-missing-template-keys=true:
        If true, ignore any errors in templates when a field or map key is missing in the
        template. Only applies to golang and jsonpath output formats.

    --chunk-size=500:
        Return large lists in chunks rather than all at once. Pass 0 to disable. This flag is beta
        and may change in the future.

    --field-selector='':
        Selector (field query) to filter on, supports '=', '==', and '!='.(e.g. --field-selector
        key1=value1,key2=value2). The server only supports a limited number of field queries per
        type.

    -f, --filename=[]:
        Filename, directory, or URL to files identifying the resource to get from a server.

    --ignore-not-found=false:
        If the requested object does not exist the command will return exit code 0.

    -k, --kustomize='':
        Process the kustomization directory. This flag can't be used together with -f or -R.

    -L, --label-columns=[]:
        Accepts a comma separated list of labels that are going to be presented as columns. Names
        are case-sensitive. You can also use multiple flag options like -L label1 -L label2...

    --no-headers=false:
        When using the default or custom-column output format, don't print headers (default print
        headers).

    -o, --output='':
        Output format. One of: (json, yaml, name, go-template, go-template-file, template,
        templatefile, jsonpath, jsonpath-as-json, jsonpath-file, custom-columns,
        custom-columns-file, wide). See custom columns
        [https://kubernetes.io/docs/reference/kubectl/#custom-columns], golang template
        [http://golang.org/pkg/text/template/#pkg-overview] and jsonpath template
        [https://kubernetes.io/docs/reference/kubectl/jsonpath/].

    --output-watch-events=false:
        Output watch event objects when --watch or --watch-only is used. Existing objects are
        output as initial ADDED events.

    --raw='':
        Raw URI to request from the server.  Uses the transport specified by the kubeconfig file.

    -R, --recursive=false:
        Process the directory used in -f, --filename recursively. Useful when you want to manage
        related manifests organized within the same directory.

    -l, --selector='':
        Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l
        key1=value1,key2=value2). Matching objects must satisfy all of the specified label
        constraints.

    --server-print=true:
        If true, have the server return the appropriate table output. Supports extension APIs and
        CRDs.

    --show-kind=false:
        If present, list the resource type for the requested object(s).

    --show-labels=false:
        When printing, show all labels as the last column (default hide labels column)

    --show-managed-fields=false:
        If true, keep the managedFields when printing objects in JSON or YAML format.

    --sort-by='':
        If non-empty, sort list types using this field specification.  The field specification is
        expressed as a JSONPath expression (e.g. '{.metadata.name}'). The field in the API
        resource specified by this JSONPath expression must be an integer or a string.

    --subresource='':
        If specified, gets the subresource of the requested object. Must be one of [status scale].
        This flag is beta and may change in the future.

    --template='':
        Template string or path to template file to use when -o=go-template, -o=go-template-file.
        The template format is golang templates
        [http://golang.org/pkg/text/template/#pkg-overview].

    -w, --watch=false:
        After listing/getting the requested object, watch for changes.

    --watch-only=false:
        Watch for changes to the requested object(s), without listing/getting first.

Usage:
  kubectl get
[(-o|--output=)json|yaml|name|go-template|go-template-file|template|templatefile|jsonpath|jsonpath-as-json|jsonpath-file|custom-columns|custom-columns-file|wide]
(TYPE[.VERSION][.GROUP] [NAME | -l label] | TYPE[.VERSION][.GROUP]/NAME ...) [flags] [options]

Use "kubectl options" for a list of global command-line options (applies to all commands).

minikube

A Kubernetes cluster can be deployed on either physical or virtual machines. To get started with Kubernetes development, you can use Minikube. Minikube is a lightweight Kubernetes implementation that creates a VM on your local machine and deploys a simple cluster containing only one node. Minikube is available for Linux, macOS, and Windows systems. The Minikube CLI provides basic bootstrapping operations for working with your cluster, including start, stop, status, and delete. [Using Minikube to Create a Cluster | Kubernetes]
  • Our goal is to install / set up a a basic Kubernetes  cluster on our local machine using the minikube utility
  • minikube creates one Node cluster, where the master and worker processes are on the same machine
  • minikube creates a non-production sandbox environment that we can use to test out Kubernetes

Before installing minikube:

1) We must can install the kubectl utility locally
    • kubectl command line tool is what we will use to manage our Kubernetes resources and our cluster after it is set up using minikube
    • Installing the kubectl utility before installing minikube will allow minikube to configure the kubectl utility to work with the cluster when it provisions it
    • kubectl utility can work with multiple clusters, local or remote, at the same time
    • if kubectl is not installed locally, minikube already includes kubectl which can be used like this: minikube kubectl -- <kubectl commands>
    • if kubectl is already installed, minikube will automatically take care of configuring it when it starts, when it provisions a Kubernetes cluster
    • To install kubectl (on Linux), follow instructions from here: Install and Set Up kubectl on Linux | Kubernetes

2) We need to make sure that virtualization is enabled.
On Linux, it is enabled if the following command returns non-empty output: 

$ grep -E --color 'vmx|svm' /proc/cpuinfo

If it's not enabled, there should be an option in BIOS to enable virtualization.


3) We need to install a virtual machine manager (hypervisor)  or minikube start such as: Docker, QEMU, Hyperkit, Hyper-V, KVM, Parallels, Podman, VirtualBox, or VMware Fusion/Workstation. 

We can use e.g. VirtualBox virtualization solution but minikube can also run without a hypervisor, directly on the host using Docker.
The Docker driver allows you to install Kubernetes into an existing Docker install. On Linux, this does not require virtualization to be enabled. [docker | minikube
VirtualBox is minikube’s original driver. It may not provide the fastest start-up time, but it is the most stable driver available for users of Microsoft Windows Home. [virtualbox | minikube]

VirtualBox installation: Linux_Downloads – Oracle VM VirtualBox

When we provision a cluster using mini cube, it will automatically create a virtual machine as required.

Minikube installation: minikube start | minikube

Let's explore minikube CLI:

$ minikube --help
minikube provisions and manages local Kubernetes clusters optimized for development workflows.

Basic Commands:
  start            Starts a local Kubernetes cluster
  status           Gets the status of a local Kubernetes cluster
  stop             Stops a running local Kubernetes cluster
  delete           Deletes a local Kubernetes cluster
  dashboard        Access the Kubernetes dashboard running within 
                   the minikube cluster
  pause            pause Kubernetes
  unpause          unpause Kubernetes

Images Commands:
  docker-env       Provides instructions to point your terminal's                       docker-cli to the Docker Engine inside minikube.
                   (Useful for building docker images directly                           inside minikube)
  podman-env       Configure environment to use minikube's Podman                       service
  cache            Manage cache for images
  image            Manage images

Configuration and Management Commands:
  addons           Enable or disable a minikube addon
  config           Modify persistent configuration values
  profile          Get or list the current profiles (clusters)
  update-context   Update kubeconfig in case of an IP or port change

Networking and Connectivity Commands:
  service          Returns a URL to connect to a service
  tunnel           Connect to LoadBalancer services

Advanced Commands:
  mount            Mounts the specified directory into minikube
  ssh              Log into the minikube environment (for debugging)
  kubectl          Run a kubectl binary matching the cluster version
  node             Add, remove, or list additional nodes
  cp               Copy the specified file into minikube

Troubleshooting Commands:
  ssh-key          Retrieve the ssh identity key path of the specified node
  ssh-host         Retrieve the ssh host key of the specified node
  ip               Retrieves the IP address of the specified node
  logs             Returns logs to debug a local Kubernetes cluster
  update-check     Print current and latest version number
  version          Print the version of minikube
  options          Show a list of global command-line options (applies to all commands).

Other Commands:
  completion       Generate command completion for a shell
  license          Outputs the licenses of dependencies to a directory

Use "minikube <command> --help" for more information about a given command.


To test minikube installation we can provision a Kubernetes cluster:

$ minikube start
😄  minikube v1.33.0 on Ubuntu 22.04
🆕  Kubernetes 1.30.0 is now available. If you would like to upgrade, specify: --kubernetes-version=v1.30.0
✨  Using the virtualbox driver based on existing profile
💿  Downloading VM boot image ...
    > minikube-v1.33.0-amd64.iso....:  65 B / 65 B [---------] 100.00% ? p/s 0s
    > minikube-v1.33.0-amd64.iso:  314.16 MiB / 314.16 MiB  100.00% 3.72 MiB p/
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🔄  Restarting existing virtualbox VM for "minikube" ...
❗  Image was not built for the current minikube version. To resolve this you can delete and recreate your minikube cluster using the latest images. Expected minikube version: v1.32.0 -> Actual minikube version: v1.33.0
🐳  Preparing Kubernetes v1.28.3 on Docker 24.0.7 ...| Bad local forwarding specification '0:localhost:8443'

🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass

❗  /usr/local/bin/kubectl is version 1.30.0, which may have incompatibilities with Kubernetes 1.28.3.
    ▪ Want kubectl v1.28.3? Try 'minikube kubectl -- get pods -A'
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default


We can also explicitly specify the driver (virtualization tool) to be used e.g. VirtualBox:

$ minikube start --driver=virtualbox

minikube downloaded the minikube ISO image for mini cube. This image is then used to provision a VM on VirtualBox. It downloaded Kubernetes version 1.28.3 and any other required binaries.

We can open VirtualBox UI we can see that a virtual machine by the name minikube has been created and it is in a running state:



kubectl utility is now configured to use the Kubernetes cluster provisioned using minikube.

To ensure that everything has been set up correctly we'll run:

$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured


Our cluster is now set up. We will deploy some applications on the cluster in order to make sure it's working as expected. 

To check if kubectl commands are working we can run:

$ kubectl get node
NAME       STATUS   ROLES           AGE   VERSION
minikube   Ready    control-plane   10d   v1.28.3


ROLES can also have a value master.

We can see that it is a single node cluster with node running Kubernetes v1.28.3.

To create a deployment using this cluster:

$ kubectl create deployment hello-minikube --image=kicbase/echo-server:1.0
deployment.apps/hello-minikube created

To check it:

$ kubectl get deployments
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
hello-minikube   1/1     1            1           68s

To expose this deployment as a service on port 8080:

$ kubectl expose deployment hello-minikube --type=NodePort --port=8080
service/hello-minikube exposed

To get info on the service:

$ kubectl get services hello-minikube
NAME             TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hello-minikube   NodePort   10.104.92.181   <none>        8080:30235/TCP   4m5s

To get the URL of the exposed service:

$ minikube service hello-minikube --url
http://192.168.59.100:30235

If we copy this URL and paste it into a browser:


Cleanup:

$ kubectl delete services hello-minikube
service "hello-minikube" deleted

$ kubectl delete deployment hello-minikube
deployment.apps "hello-minikube" deleted

$ kubectl get pods
No resources found in default namespace.


To continue your Kubernetes learning journey, read the next article in these serie: Managing Pods In A Minikube Cluster | My Public Notepad
---