Friday, 17 January 2025

Introduction to Elasticsearch





What is Elasticsearch?

  • An open-source analytics and full-text search engine.
  • Commonly used to enable search functionality for applications, such as blogs, webshops, or other systems. Example: in blog, search for blog posts, products, categories

Capabilities of Elasticsearch:

  • Supports complex search functionality similar to Google:
    • Autocompletion.
    • Typo correction.
    • Highlighting matches.
    • Synonym handling.
    • Relevance adjustment.
  • Enables filtering and sorting, such as by price, brand, or other attributes.

Advanced Use Cases:

  • Full-text search and relevance boosting (e.g., highly-rated products).
  • Filtering and sorting by various factors (price, size, brand, etc.).

Analytics Platform:

  • Allows querying structured data (e.g., numbers) and aggregating results.
  • Useful for creating pie charts, line charts, and other visualizations.

Application Performance Management (APM):

  • Common use case for monitoring logs, errors, and server metrics.
  • Examples include tracking web application errors or server CPU/memory usage, displayed on line charts.

Event and Sales Analysis:

  • Analyze events like sales from physical stores using aggregations.
  • Examples include identifying top-selling stores or forecasting sales using machine learning.

Machine Learning Capabilities:

  • Forecasting:
    • Sales predictions for capacity management.
    • Estimating staffing needs or server scaling based on historical data.
  • Anomaly detection:
    • Identifying significant deviations from normal behavior (e.g., drop in website traffic).
      • machine learning learns the “norm” and let you know when there is an anomality, i.e. when there is a significant deviation from the normal behavior.
    • Automates alerting for unusual activities without needing manual thresholds.
    • We can then set up alerting (email, Slack) for this and be notified whenever something unusual happens

How Elasticsearch Works:

  • Data is stored as documents (JSON objects), analogous to rows in a relational database.
  • Each document has fields, similar to columns in a database table.
  • Uses a RESTful API for querying and interacting with the data.
  • Queries are written in JSON, making the API straightforward to use.

Technology and Scalability:

  • Written in Java and built on Apache Lucene.
  • Highly scalable and distributed by nature, handling massive data volumes and high query throughput.
  • Supports lightning-fast searches, even for millions of documents.

Community and Adoption:

  • Widely adopted by large companies and has a vibrant community for support and collaboration.


Index Templates


Deletion:

curl -u "user:pass" -X DELETE "https://elasticsearch.my-corp.com:443/_index_template/index_template_name"


No comments: