Pages

Monday, 5 August 2024

Introduction to Amazon Simple Queue Service (SQS)



Amazon Simple Queue Service (SQS) is a fully managed message queuing service provided by Amazon Web Services (AWS). It enables decoupling and scaling of microservices, distributed systems, and serverless applications. 


Here's an overview of how Amazon SQS works:

Key Concepts


  • Queue:
    • A queue is a temporary storage location for messages waiting to be processed (polled by consumers). There are two types of queues in SQS:
      • Standard Queue: Offers maximum throughput, best-effort ordering, and at-least-once delivery.
      • FIFO Queue: Ensures exactly-once processing and preserves the exact order of messages.
  • Message:
    • A message is the data that is sent between different components. It can be up to 256 KB in size and contains the information needed for processing.
  • Producer:
    • The producer (or sender) sends messages to the queue. Producers can be applications, microservices and other AWS services.
  • Consumer:
    • The consumer (or receiver) retrieves and processes messages from the queue. Consumers can be Lambda functions, EC2 instances and other AWS services
  • Visibility Timeout:
    • A period during which a message is invisible to other consumers after a consumer retrieves it from the queue. This prevents other consumers from processing the same message concurrently.
  • Dead-Letter Queue (DLQ):
    • A queue for messages that could not be processed successfully after a specified number of attempts. This helps in isolating and analyzing problematic messages.

Workflow


  • Sending Messages:
    • A producer sends messages to an SQS queue using the SendMessage action. Each message is assigned a unique ID and placed in the queue.
  • Receiving Messages:
    • A consumer retrieves messages from the queue using the ReceiveMessage action. This operation can specify:
      • number of messages to retrieve (up to 10) 
      • duration to wait if no messages are available
  • Processing Messages:
    • After receiving a message, the consumer processes it. The message remains invisible to other consumers for a specified visibility timeout.
  • Deleting Messages:
    • Once processed, the consumer deletes the message from the queue using the DeleteMessage action. If not deleted within the visibility timeout, the message becomes visible again for other consumers to process.
  • Handling Failures:
    • If a message cannot be processed successfully within a specified number of attempts, it is moved to the Dead-Letter Queue for further investigation.


Additional Features

  • Long Polling:
    • Reduces the number of empty responses by allowing the ReceiveMessage action to wait for a specified amount of time until a message arrives in the queue.
  • Message Attributes:
    • Metadata about the message that can be used for filtering and routing.
  • Batch Operations:
    • SQS supports batch sending, receiving, and deleting of messages, which can improve efficiency and reduce costs.


Security and Access Control


  • IAM Policies:
    • Use AWS Identity and Access Management (IAM) policies to control access to SQS queues.
  • Encryption:
    • Messages can be encrypted in transit using SSL/TLS and at rest using AWS Key Management Service (KMS).

Use Cases


  • Decoupling Microservices:
    • SQS allows microservices to communicate asynchronously, improving scalability and fault tolerance.
  • Work Queues:
    • Distributing tasks to multiple workers for parallel processing.
  • Event Sourcing:
    • Storing a series of events to track changes in state over time.

Example Scenario


Order Processing System:

  • An e-commerce application has separate microservices for handling orders, inventory, and shipping.
  • The order service sends an order message to an SQS queue.
  • The inventory service retrieves the message, processes it (e.g., reserves stock), and then sends an updated message to another queue.
  • The shipping service retrieves the updated message and processes it (e.g., ships the item).

By using Amazon SQS, these microservices can operate independently and scale as needed, ensuring reliable and efficient order processing.


What is best-effort ordering?



Best-effort ordering is the default delivery logic for Amazon SQS Standard queues. Under this model, SQS attempts to deliver messages in the same order they were sent, but it does not guarantee it. 

How It Works

  • General Alignment: SQS uses a highly distributed architecture to achieve nearly unlimited throughput. While it tries to maintain a "loose FIFO" (First-In, First-Out) flow, messages may occasionally be delivered out of sequence.
  • Cause of Reordering: Out-of-order delivery typically occurs due to the way messages are stored across multiple servers and availability zones for redundancy. Factors like high throughput, network delays, or failure recovery can cause a message sent later to be available for retrieval before an earlier one. 

Comparison with FIFO Queues


If your application requires strict ordering, you must use SQS FIFO queues instead of Standard queues. 

Feature                     Best-Effort Ordering (Standard)       Strict Ordering (FIFO)
----------                       ------------------------------------------          -----------------------------
Ordering Guarantee    No (messages may arrive out of order)    Yes (exact order preserved)
Throughput             Nearly unlimited                                       Limited (unless High Throughput mode is used)
Delivery Model     At-least-once (duplicates possible)       Exactly-once (no duplicates)
Cost                             Lower                                                       Slightly higher

Best Practices for Best-Effort Ordering

  • Idempotency: Ensure your application can handle the same message multiple times without unintended side effects.
  • Tolerance for Shuffle: Use Standard queues for workloads where order isn't critical, such as processing log data, real-time analytics, or distributing independent background tasks.
  • Application-Level Logic: If you need some ordering but want the high throughput of Standard queues, you can include sequence numbers in your message attributes and handle the reordering logic within your consumer application. 


How can some AWS Service send messages to a queue?


AWS services send messages to an Amazon SQS queue through three primary methods: direct API calls, event-driven notifications, or as a downstream target for messaging services. 

1. Direct API Integration (Producer Model)


Many compute services act as "producers" by calling the SendMessage or SendMessageBatch API actions directly using an AWS SDK (like Boto3 for Python or the SDK for Node.js). 

  • AWS Lambda: A function can use an SDK to programmatically push results or tasks into a queue for further processing.
  • Amazon EC2 & ECS: Applications running on virtual machines or in containers can send messages to SQS to decouple from backend systems.
  • AWS Step Functions: You can use a "Task" state to publish a message directly to SQS as part of a workflow. 

2. Event-Driven Notifications


Certain services can be configured to automatically "push" notifications into a queue when specific events occur. 
  • Amazon S3: You can set up S3 Event Notifications to send a message to SQS whenever an object is created, deleted, or restored in a bucket.
  • Amazon EventBridge: You can create rules that match specific system events and route them to an SQS queue as a target. 

3. Messaging Service Fan-out


SQS often acts as a subscriber or target for other messaging and integration services. 
  • Amazon SNS: Using the "fan-out" pattern, a message published to an SNS topic can be automatically delivered to multiple SQS queues simultaneously.
  • Amazon API Gateway: You can integrate an API endpoint directly with SQS. This allows external clients to send messages to your queue via a REST API without needing a Lambda function in between. 

Crucial Requirement: Permissions 


For any service to send messages, it must have the sqs:SendMessage permission granted via an IAM Policy. Additionally, the SQS Queue Access Policy must explicitly allow the sending service or account to perform that action.


Dead Letter Queues (DLQ)



In the context of AWS SQS, DLQ stands for Dead Letter Queue. 

It is not a special type of queue; it is simply a standard SQS queue that is designated as a "holding pen" for messages that could not be processed successfully by your consumer application.

How it Works


When a message is picked up from your main queue, the consumer tries to process it. If the consumer fails (crashes, times out, or throws an error), the message returns to the main queue to be tried again. Without a DLQ, a "poison pill" message (one that causes a crash every single time) could cycle through your system forever, wasting resources. A DLQ solves this by setting a Redrive Policy.

Main Queue: Receives the message.
Maximum Receives: You define a limit (e.g., 3 or 5). If a message fails this many times, SQS gives up.
The Move: SQS automatically moves that specific message from the main queue into the Dead Letter Queue.

Why use a DLQ?


Isolate Problematic Data: It separates "bad" messages from the "good" ones so your main pipeline can keep flowing.
Debugging: You can inspect the DLQ to see exactly what caused the failure (e.g., malformed JSON or an unexpected null value).
Manual Recovery: Once you fix the bug in your code, you can "drive" the messages back from the DLQ into the main queue to be processed correctly.

Common SQS Attributes for DLQs


Term                        Meaning
-------                        ------------
Source Queue          The original queue where messages arrive first.
Redrive Policy         The configuration that links the source queue to the DLQ.
maxReceiveCount   The number of times a message can fail before being sent to the DLQ.

If queues ending in -dlq have 0.0 activity (number of messages written in past X days), this is actually a sign of a healthy system! It means:

  • Your consumers are processing messages successfully.
  • No "poison pill" messages have failed enough times to be kicked over to the dead letter storage.

Always set an alarm on your DLQs for ApproximateNumberOfMessagesVisible > 0. You want to know immediately when a message lands there, as it usually means a bug in your consumer code or a problem with your data.


Is DLQ created automatically for each SQS queue?


No, a Dead Letter Queue (DLQ) is not created automatically. It is a configuration you must set up yourself.

In AWS, a DLQ is just a regular SQS queue that you have "pointed" another queue toward. If you create a queue named orders-queue and don't explicitly tell it to use a DLQ, it simply won't have one.

The Two Steps to Creating a DLQ


To set up a DLQ, you have to perform two distinct actions:

  1. Create the DLQ itself: You create a second, standalone SQS queue (usually named with a -dlq suffix).
  2. Configure the Redrive Policy: You go back to your source queue and update its settings to point to the ARN (Amazon Resource Name) of the DLQ you just created.

Key Requirements


For a DLQ relationship to work, a few rules must be followed:
  • Same Region: The source queue and the DLQ must be in the same AWS Region and the same AWS Account
  • Same Type: A FIFO queue must use a FIFO queue as its DLQ. A Standard queue must use a Standard queue as its DLQ.
  • Permissions: If you are using a custom KMS key for encryption, the source queue needs permission to use that key to move messages to the DLQ.

What happens if you don't have a DLQ?


If a message fails to process and you haven't configured a DLQ:
  1. The message returns to the source queue after the Visibility Timeout expires.
  2. The consumer picks it up again.
  3. If the message is "poison" (causes a crash), this loop repeats indefinitely until the message's Message Retention Period (default 4 days) expires.
  4. Once the retention period is up, the message is simply deleted by AWS and lost forever.

In professional environments it is "Best Practice" to create a DLQ for every production queue. We should have 1:1 ratio of queue and -dlq names in our list of SQS queues. Engineers should create and link them to ensure no data is lost during a processing failure.

Any queues in our list that don't have a matching -dlq might be candidates for a quick configuration update!

Message Queuing Service - Amazon Simple Queue Service - AWS

No comments:

Post a Comment