Friday, 17 January 2025

Introduction to Elasticsearch





What is Elasticsearch?

  • An open-source analytics and full-text search engine.
  • Commonly used to enable search functionality for applications, such as blogs, webshops, or other systems. Example: in blog, search for blog posts, products, categories

Capabilities of Elasticsearch:

  • Supports complex search functionality similar to Google:
    • Autocompletion.
    • Typo correction.
    • Highlighting matches.
    • Synonym handling.
    • Relevance adjustment.
  • Enables filtering and sorting, such as by price, brand, or other attributes.

Advanced Use Cases:

  • Full-text search and relevance boosting (e.g., highly-rated products).
  • Filtering and sorting by various factors (price, size, brand, etc.).

Analytics Platform:

  • Allows querying structured data (e.g., numbers) and aggregating results.
  • Useful for creating pie charts, line charts, and other visualizations.

Application Performance Management (APM):

  • Common use case for monitoring logs, errors, and server metrics.
  • Examples include tracking web application errors or server CPU/memory usage, displayed on line charts.

Event and Sales Analysis:

  • Analyze events like sales from physical stores using aggregations.
  • Examples include identifying top-selling stores or forecasting sales using machine learning.

Machine Learning Capabilities:

  • Forecasting:
    • Sales predictions for capacity management.
    • Estimating staffing needs or server scaling based on historical data.
  • Anomaly detection:
    • Identifying significant deviations from normal behavior (e.g., drop in website traffic).
      • machine learning learns the “norm” and let you know when there is an anomality, i.e. when there is a significant deviation from the normal behavior.
    • Automates alerting for unusual activities without needing manual thresholds.
    • We can then set up alerting (email, Slack) for this and be notified whenever something unusual happens

How Elasticsearch Works:

  • Data is stored as documents (JSON objects), analogous to rows in a relational database.
  • Each document has fields, similar to columns in a database table.
  • Uses a RESTful API for querying and interacting with the data.
  • Queries are written in JSON, making the API straightforward to use.

Technology and Scalability:

  • Written in Java and built on Apache Lucene.
  • Highly scalable and distributed by nature, handling massive data volumes and high query throughput.
  • Supports lightning-fast searches, even for millions of documents.

Community and Adoption:

  • Widely adopted by large companies and has a vibrant community for support and collaboration.

Thursday, 9 January 2025

ELK Stack Interview Questions




Elasticsearch (ES)





Kibana


  • How to install Kibana on bare metal?
    • How to install Kibana in k8s cluster?
  • What are Dashboards?
  • What are Alerts?
  • How to back up and Elastic objects like dashboards and alerts? How to restore them in another Elastic instance?
  • TBC


Wednesday, 8 January 2025

How to locally run Helm from a Docker container


Instead of managing a local installation of Helm, I prefer using its latest version via Docker container: alpine/helm - Docker Image | Docker Hub.

% docker run -it --rm  -v ~/.helm:/root/.helm -v ~/.config/helm:/root/.config/helm -v ~/.cache/helm:/root/.cache/helm alpine/helm
The Kubernetes package manager

Common actions for Helm:

- helm search:    search for charts
- helm pull:      download a chart to your local directory to view
- helm install:   upload the chart to Kubernetes
- helm list:      list releases of charts

Environment variables:

| Name                               | Description                                                                                                |
|------------------------------------|------------------------------------------------------------------------------------------------------------|
| $HELM_CACHE_HOME                   | set an alternative location for storing cached files.                                                      |
| $HELM_CONFIG_HOME                  | set an alternative location for storing Helm configuration.                                                |
| $HELM_DATA_HOME                    | set an alternative location for storing Helm data.                                                         |
| $HELM_DEBUG                        | indicate whether or not Helm is running in Debug mode                                                      |
| $HELM_DRIVER                       | set the backend storage driver. Values are: configmap, secret, memory, sql.                                |
| $HELM_DRIVER_SQL_CONNECTION_STRING | set the connection string the SQL storage driver should use.                                               |
| $HELM_MAX_HISTORY                  | set the maximum number of helm release history.                                                            |
| $HELM_NAMESPACE                    | set the namespace used for the helm operations.                                                            |
| $HELM_NO_PLUGINS                   | disable plugins. Set HELM_NO_PLUGINS=1 to disable plugins.                                                 |
| $HELM_PLUGINS                      | set the path to the plugins directory                                                                      |
| $HELM_REGISTRY_CONFIG              | set the path to the registry config file.                                                                  |
| $HELM_REPOSITORY_CACHE             | set the path to the repository cache directory                                                             |
| $HELM_REPOSITORY_CONFIG            | set the path to the repositories file.                                                                     |
| $KUBECONFIG                        | set an alternative Kubernetes configuration file (default "~/.kube/config")                                |
| $HELM_KUBEAPISERVER                | set the Kubernetes API Server Endpoint for authentication                                                  |
| $HELM_KUBECAFILE                   | set the Kubernetes certificate authority file.                                                             |
| $HELM_KUBEASGROUPS                 | set the Groups to use for impersonation using a comma-separated list.                                      |
| $HELM_KUBEASUSER                   | set the Username to impersonate for the operation.                                                         |
| $HELM_KUBECONTEXT                  | set the name of the kubeconfig context.                                                                    |
| $HELM_KUBETOKEN                    | set the Bearer KubeToken used for authentication.                                                          |
| $HELM_KUBEINSECURE_SKIP_TLS_VERIFY | indicate if the Kubernetes API server's certificate validation should be skipped (insecure)                |
| $HELM_KUBETLS_SERVER_NAME          | set the server name used to validate the Kubernetes API server certificate                                 |
| $HELM_BURST_LIMIT                  | set the default burst limit in the case the server contains many CRDs (default 100, -1 to disable)         |
| $HELM_QPS                          | set the Queries Per Second in cases where a high number of calls exceed the option for higher burst values |

Helm stores cache, configuration, and data based on the following configuration order:

- If a HELM_*_HOME environment variable is set, it will be used
- Otherwise, on systems supporting the XDG base directory specification, the XDG variables will be used
- When no other location is set a default location will be used based on the operating system

By default, the default directories depend on the Operating System. The defaults are listed below:

| Operating System | Cache Path                | Configuration Path             | Data Path               |
|------------------|---------------------------|--------------------------------|-------------------------|
| Linux            | $HOME/.cache/helm         | $HOME/.config/helm             | $HOME/.local/share/helm |
| macOS            | $HOME/Library/Caches/helm | $HOME/Library/Preferences/helm | $HOME/Library/helm      |
| Windows          | %TEMP%\helm               | %APPDATA%\helm                 | %APPDATA%\helm          |

Usage:
  helm [command]

Available Commands:
  completion  generate autocompletion scripts for the specified shell
  create      create a new chart with the given name
  dependency  manage a chart's dependencies
  env         helm client environment information
  get         download extended information of a named release
  help        Help about any command
  history     fetch release history
  install     install a chart
  lint        examine a chart for possible issues
  list        list releases
  package     package a chart directory into a chart archive
  plugin      install, list, or uninstall Helm plugins
  pull        download a chart from a repository and (optionally) unpack it in local directory
  push        push a chart to remote
  registry    login to or logout from a registry
  repo        add, list, remove, update, and index chart repositories
  rollback    roll back a release to a previous revision
  search      search for a keyword in charts
  show        show information of a chart
  status      display the status of the named release
  template    locally render templates
  test        run tests for a release
  uninstall   uninstall a release
  upgrade     upgrade a release
  verify      verify that a chart at the given path has been signed and is valid
  version     print the client version information

Flags:
      --burst-limit int                 client-side default throttling limit (default 100)
      --debug                           enable verbose output
  -h, --help                            help for helm
      --kube-apiserver string           the address and the port for the Kubernetes API server
      --kube-as-group stringArray       group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --kube-as-user string             username to impersonate for the operation
      --kube-ca-file string             the certificate authority file for the Kubernetes API server connection
      --kube-context string             name of the kubeconfig context to use
      --kube-insecure-skip-tls-verify   if true, the Kubernetes API server's certificate will not be checked for validity. This will make your HTTPS connections insecure
      --kube-tls-server-name string     server name to use for Kubernetes API server certificate validation. If it is not provided, the hostname used to contact the server is used
      --kube-token string               bearer token used for authentication
      --kubeconfig string               path to the kubeconfig file
  -n, --namespace string                namespace scope for this request
      --qps float32                     queries per second used when communicating with the Kubernetes API, not including bursting
      --registry-config string          path to the registry config file (default "/root/.config/helm/registry/config.json")
      --repository-cache string         path to the directory containing cached repository indexes (default "/root/.cache/helm/repository")
      --repository-config string        path to the file containing repository names and URLs (default "/root/.config/helm/repositories.yaml")

Use "helm [command] --help" for more information about a command.


Example: Adding a Helm chart repository

% docker run -it --rm  -v ~/.helm:/root/.helm -v ~/.config/helm:/root/.config/helm -v ~/.cache/helm:/root/.cache/helm alpine/helm repo add elastic https://helm.elastic.co
"elastic" has been added to your repositories


Example: Updating a Helm chart repository

% docker run -it --rm  -v ~/.helm:/root/.helm -v ~/.config/helm:/root/.config/helm -v ~/.cache/helm:/root/.cache/helm alpine/helm repo update                             
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "elastic" chart repository
Update Complete. ⎈Happy Helming!⎈



Example: View all configurable values in a chart

% docker run -it --rm  -v ~/.helm:/root/.helm -v ~/.config/helm:/root/.config/helm -v ~/.cache/helm:/root/.cache/helm alpine/helm show values elastic/eck-operator
# nameOverride is the short name for the deployment. Leave empty to let Helm generate a name using chart values.
nameOverride: "elastic-operator"

# fullnameOverride is the full name for the deployment. Leave empty to let Helm generate a name using chart values.
fullnameOverride: "elastic-operator"

# managedNamespaces is the set of namespaces that the operator manages. Leave empty to manage all namespaces.
managedNamespaces: []

# installCRDs determines whether Custom Resource Definitions (CRD) are installed by the chart.
# Note that CRDs are global resources and require cluster admin privileges to install.
# If you are sharing a cluster with other users who may want to install ECK on their own namespaces, setting this to true can have unintended consequences.
# 1. Upgrades will overwrite the global CRDs and could disrupt the other users of ECK who may be running a different version.
# 2. Uninstalling the chart will delete the CRDs and potentially cause Elastic resources deployed by other users to be removed as well.
installCRDs: true

# replicaCount is the number of operator pods to run.
replicaCount: 1

image:
  # repository is the container image prefixed by the registry name.
  repository: docker.elastic.co/eck/eck-operator
  # pullPolicy is the container image pull policy.
  pullPolicy: IfNotPresent
  # tag is the container image tag. If not defined, defaults to chart appVersion.
  tag: null
  # fips specifies whether the operator will use a FIPS compliant container image for its own StatefulSet image.
  # This setting does not apply to Elastic Stack applications images.
  # Can be combined with config.ubiOnly.
  fips: false

# priorityClassName defines the PriorityClass to be used by the operator pods.
priorityClassName: ""

# imagePullSecrets defines the secrets to use when pulling the operator container image.
imagePullSecrets: []

# resources define the container resource limits for the operator.
resources:
  limits:
    cpu: 1
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 150Mi

# statefulsetAnnotations define the annotations that should be added to the operator StatefulSet.
statefulsetAnnotations: {}

# statefulsetLabels define additional labels that should be added to the operator StatefulSet.
statefulsetLabels: {}

# podAnnotations define the annotations that should be added to the operator pod.
podAnnotations: {}

## podLabels define additional labels that should be added to the operator pod.
podLabels: {}

# podSecurityContext defines the pod security context for the operator pod.
podSecurityContext:
  runAsNonRoot: true

# securityContext defines the security context of the operator container.
securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true
  runAsNonRoot: true

# nodeSelector defines the node selector for the operator pod.
nodeSelector: {}

# tolerations defines the node tolerations for the operator pod.
tolerations: []

# affinity defines the node affinity rules for the operator pod.
affinity: {}

# podDisruptionBudget configures the minimum or the maxium available pods for voluntary disruptions,
# set to either an integer (e.g. 1) or a percentage value (e.g. 25%).
podDisruptionBudget:
  enabled: false
  minAvailable: 1
  # maxUnavailable: 3

# additional environment variables for the operator container.
env: []

# additional volume mounts for the operator container.
volumeMounts: []

# additional volumes to add to the operator pod.
volumes: []

# createClusterScopedResources determines whether cluster-scoped resources (ClusterRoles, ClusterRoleBindings) should be created.
createClusterScopedResources: true

# Automount API credentials for the Service Account into the pod.
automountServiceAccountToken: true

serviceAccount:
  # create specifies whether a service account should be created for the operator.
  create: true
  # Specifies whether a service account should automount API credentials.
  automountServiceAccountToken: true
  # annotations to add to the service account
  annotations: {}
  # name of the service account to use. If not set and create is true, a name is generated using the fullname template.
  name: ""

tracing:
  # enabled specifies whether APM tracing is enabled for the operator.
  enabled: false
  # config is a map of APM Server configuration variables that should be set in the environment.
  config:
    ELASTIC_APM_SERVER_URL: http://localhost:8200
    ELASTIC_APM_SERVER_TIMEOUT: 30s

refs:
  # enforceRBAC specifies whether RBAC should be enforced for cross-namespace associations between resources.
  enforceRBAC: false

webhook:
  # enabled determines whether the webhook is installed.
  enabled: true
  # caBundle is the PEM-encoded CA trust bundle for the webhook certificate. Only required if manageCerts is false and certManagerCert is null.
  caBundle: Cg==
  # certManagerCert is the name of the cert-manager certificate to use with the webhook.
  certManagerCert: null
  # certsDir is the directory to mount the certificates.
  certsDir: "/tmp/k8s-webhook-server/serving-certs"
  # failurePolicy of the webhook.
  failurePolicy: Ignore
  # manageCerts determines whether the operator manages the webhook certificates automatically.
  manageCerts: true
  # namespaceSelector corresponds to the namespaceSelector property of the webhook.
  # Setting this restricts the webhook to act only on objects submitted to namespaces that match the selector.
  namespaceSelector: {}
  # objectSelector corresponds to the objectSelector property of the webhook.
  # Setting this restricts the webhook to act only on objects that match the selector.
  objectSelector: {}
  # port is the port that the validating webhook binds to.
  port: 9443
  # secret specifies the Kubernetes secret to be mounted into the path designated by the certsDir value to be used for webhook certificates.
  certsSecret: ""

# hostNetwork allows a Pod to use the Node network namespace.
# This is required to allow for communication with the kube API when using some alternate CNIs in conjunction with webhook enabled.
# CAUTION: Proceed at your own risk. This setting has security concerns such as allowing malicious users to access workloads running on the host.
hostNetwork: false

softMultiTenancy:
  # enabled determines whether the operator is installed with soft multi-tenancy extensions.
  # This requires network policies to be enabled on the Kubernetes cluster.
  enabled: false

# kubeAPIServerIP is required when softMultiTenancy is enabled.
kubeAPIServerIP: null

telemetry:
  # disabled determines whether the operator periodically updates ECK telemetry data for Kibana to consume.
  disabled: false
  # distributionChannel denotes which distribution channel was used to install the operator.
  distributionChannel: "helm"

# config values for the operator.
config:
  # logVerbosity defines the logging level. Valid values are as follows:
  # -2: Errors only
  # -1: Errors and warnings
  #  0: Errors, warnings, and information
  #  number greater than 0: Errors, warnings, information, and debug details.
  logVerbosity: "0"

  # (Deprecated: use metrics.port: will be removed in v2.14.0) metricsPort defines the port to expose operator metrics. Set to 0 to disable metrics reporting.
  metricsPort: 0

  metrics:
    # port defines the port to expose operator metrics. Set to 0 to disable metrics reporting.
    port: "0"
    # secureMode contains the options for enabling and configuring RBAC and TLS/HTTPs for the metrics endpoint.
    secureMode:
      # secureMode.enabled specifies whether to enable RBAC and TLS/HTTPs for the metrics endpoint.
      # * This option makes most sense when using a ServiceMonitor to scrape the metrics and is therefore mutually exclusive with the podMonitor.enabled option.
      # * This option also requires using cluster scoped resources (ClusterRole, ClusterRoleBinding) to
      #   grant access to the /metrics endpoint. (createClusterScopedResources: true is required)
      #
      enabled: false
      tls:
        # certificateSecret is the name of the tls secret containing the custom TLS certificate and key for the secure metrics endpoint.
        #
        # * This is an optional setting and is only required if you are using a custom TLS certificate. A self-signed certificate will be generated by default.
        # * TLS secret key must be named tls.crt.
        # * TLS key's secret key must be named tls.key.
        # * It is assumed to be in the same namespace as the ServiceMonitor.
        #
        # example: kubectl create secret tls eck-metrics-tls-certificate -n elastic-system \
        #            --cert=/path/to/tls.crt --key=/path/to/tls.key
        certificateSecret: ""

  # containerRegistry to use for pulling Elasticsearch and other application container images.
  containerRegistry: docker.elastic.co

  # containerRepository to use for pulling Elasticsearch and other application container images.
  # containerRepository: ""

  # containerSuffix suffix to be appended to container images by default. Cannot be combined with -ubiOnly flag
  # containerSuffix: ""

  # maxConcurrentReconciles is the number of concurrent reconciliation operations to perform per controller.
  maxConcurrentReconciles: "3"

  # caValidity defines the validity period of the CA certificates generated by the operator.
  caValidity: 8760h

  # caRotateBefore defines when to rotate a CA certificate that is due to expire.
  caRotateBefore: 24h

  # caDir defines the directory containing a CA certificate (tls.crt) and its associated private key (tls.key) to be used for all managed resources.
  # Setting this makes caRotateBefore and caValidity values ineffective.
  caDir: ""

  # certificatesValidity defines the validity period of certificates generated by the operator.
  certificatesValidity: 8760h

  # certificatesRotateBefore defines when to rotate a certificate that is due to expire.
  certificatesRotateBefore: 24h

  # disableConfigWatch specifies whether the operator watches the configuration file for changes.
  disableConfigWatch: false

  # exposedNodeLabels is an array of regular expressions of node labels which are allowed to be copied as annotations on Elasticsearch Pods.
  exposedNodeLabels: [ "topology.kubernetes.io/.*", "failure-domain.beta.kubernetes.io/.*" ]

  # ipFamily specifies the IP family to use. Possible values: IPv4, IPv6 and "" (auto-detect)
  ipFamily: ""

  # setDefaultSecurityContext determines whether a default security context is set on application containers created by the operator.
  # *note* that the default option now is "auto-detect" to attempt to set this properly automatically when both running
  # in an openshift cluster, and a standard kubernetes cluster.  Valid values are as follows:
  # "auto-detect" : auto detect
  # "true"        : set pod security context when creating resources.
  # "false"       : do not set pod security context when creating resources.
  setDefaultSecurityContext: "auto-detect"

  # kubeClientTimeout sets the request timeout for Kubernetes API calls made by the operator.
  kubeClientTimeout: 60s

  # elasticsearchClientTimeout sets the request timeout for Elasticsearch API calls made by the operator.
  elasticsearchClientTimeout: 180s

  # validateStorageClass specifies whether storage classes volume expansion support should be verified.
  # Can be disabled if cluster-wide storage class RBAC access is not available.
  validateStorageClass: true

  # enableLeaderElection specifies whether leader election should be enabled
  enableLeaderElection: true

  # Interval between observations of Elasticsearch health, non-positive values disable asynchronous observation.
  elasticsearchObservationInterval: 10s

  # ubiOnly specifies whether the operator will use only UBI container images to deploy Elastic Stack applications as well as for its own StatefulSet image. UBI images are only available from 7.10.0 onward.
  # Cannot be combined with the containerSuffix value.
  ubiOnly: false

# Prometheus PodMonitor configuration
# Reference: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#podmonitor
podMonitor:

  # enabled determines whether a podMonitor should deployed to scrape the eck metrics.
  # This requires the prometheus operator and the config.metrics.port not to be 0
  enabled: false

  # labels adds additional labels to the podMonitor
  labels: {}

  # annotations adds additional annotations to the podMonitor
  annotations: {}

  # namespace determines in which namespace the podMonitor will be deployed.
  # If not set the podMonitor will be created in the namespace where the Helm release is installed into
  # namespace: monitoring

  # interval specifies the interval at which metrics should be scraped
  interval: 5m

  # scrapeTimeout specifies the timeout after which the scrape is ended
  scrapeTimeout: 30s

  # podTargetLabels transfers labels on the Kubernetes Pod onto the target.
  podTargetLabels: []

  # podMetricsEndpointConfig allows to add an extended configuration to the podMonitor
  podMetricsEndpointConfig: {}
  # honorTimestamps: true

# Prometheus ServiceMonitor configuration
# Only used when config.enableSecureMetrics is true
# Reference: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#servicemonitor
serviceMonitor:
  # This option requires the following settings within Prometheus to function:
  # 1. RBAC settings for the Prometheus instance to access the metrics endpoint.
  #
  # - nonResourceURLs:
  #   - /metrics
  #   verbs:
  #   - get
  #
  # 2. If using the Prometheus Operator and your Prometheus instance is not in the same namespace as the operator you will need
  #    the Prometheus Operator configured with the following Helm values:
  #
  #   prometheus:
  #     prometheusSpec:
  #       serviceMonitorNamespaceSelector: {}
  #       serviceMonitorSelectorNilUsesHelmValues: false
  #
  # allows to disable the serviceMonitor, enabled by default for backwards compatibility
  enabled: true
  # namespace determines in which namespace the serviceMonitor will be deployed.
  # If not set the serviceMonitor will be created in the namespace where the Helm release is installed into
  # namespace: monitoring
  # caSecret is the name of the secret containing the custom CA certificate used to generate the custom TLS certificate for the secure metrics endpoint.
  #
  # * This *must* be the name of the secret containing the CA certificate used to sign the custom TLS certificate for the metrics endpoint.
  # * This secret *must* be in the same namespace as the Prometheus instance that will scrape the metrics.
  # * If using the Prometheus operator this secret must be within the `spec.secrets` field of the `Prometheus` custom resource such that it is mounted into the Prometheus pod at `caMountDirectory`, which defaults to /etc/prometheus/secrets/{secret-name}.
  # * This is an optional setting and is only required if you are using a custom TLS certificate.
  # * Key must be named ca.crt.
  #
  # example: kubectl create secret generic eck-metrics-tls-ca -n monitoring \
  #            --from-file=ca.crt=/path/to/ca.pem
  caSecret: ""
  # caMountDirectory is the directory at which the CA certificate is mounted within the Prometheus pod.
  #
  # * You should only need to adjust this if you are *not* using the Prometheus operator.
  caMountDirectory: "/etc/prometheus/secrets/"
  # insecureSkipVerify specifies whether to skip verification of the TLS certificate for the secure metrics endpoint.
  #
  # * If this setting is set to false, then the following settings are required:
  #   - certificateSecret
  #   - caSecret
  insecureSkipVerify: true

# Globals meant for internal use only
global:
  # manifestGen specifies whether the chart is running under manifest generator.
  # This is used for tasks specific to generating the all-in-one.yaml file.
  manifestGen: false
  # createOperatorNamespace defines whether the operator namespace manifest should be generated when in manifestGen mode.
  # Usually we do want that to happen (e.g. all-in-one.yaml) but, sometimes we don't (e.g. E2E tests).
  createOperatorNamespace: true
  # kubeVersion is the effective Kubernetes version we target when generating the all-in-one.yaml.
  kubeVersion: 1.21.0