This chapter covers
- Why we need a message queuing tier
- Understanding message durability
- How to accommodate offline consumers
- What are message delivery semantics
- Choosing the right technology
3.1 Why we need a message queuing tier
With a streaming system we want the same, to decouple the components in each tier, but more importantly to decouple the tiers from each other. If you look up interprocess communication in the literature you will find various different models, for this chapter we are going to focus on the message-queuing model. By adopting this model our collection tier will be de-coupled from our analytics tier.
- The decoupling allows our tiers to work at a higher level of abstraction, that being by pass messages and not having explicit calls to the next layer.
- These are two very good properties to have in any system, let alone a distributed streaming one and as we will see in this and the coming chapters the decoupling of the tiers provides us with some wonderful benefits.
3.2 Core Concepts
With our motivation under our belt we are ready to take a look at the features of a message queuing product that are critical to the success of our streaming system.
Before we dive into the core features let's take a moment to make sure we have an understanding of the components of a message queuing product and how they map to our streaming architecture.
3.2.1 The Producer, The Broker, and the Consumer
In the message queuing world there are three main components the Producer, the Broker, and the Consumer. Each of these components plays a very important role in the overall functioning and design of the message queuing product.
The producer produces messages and the consumers consumes messages. You may notice in figure 3.3 that the term broker is used and not the terms message queue, a logical question is why the change? Well it is not so much. a change as it is an abstraction, this abstraction is important as a broker may manage multiple queues.
When you take figure 3.4 into account, then the data flow starts to make more sense. If you follow the flow from left to right in figure 3.6 you will see the following steps taking place:
- The producer sends a message a broker
- The broker puts the message into a queue
- The consumer reads the message from the broker
To put this in perspective, let's see what it looks like if we overlay these terms and pieces onto our streaming architecture.
Looking at figure 3.5 I think you would agree that this seems pretty simple and straightforward, but as the saying goes the devil in the details. It is in these details – the subtle interactions between the producer, broker, and consumer as well as various behaviors of the broker that we will now turn our attention to.
3.2.2 Isolating Producers from Consumers
As we talked about in section 3.1 one of our goals is to decouple the different tiers in the system, the message-queuing tier allows us to do this with the collection and analysis tiers.
Depending on your business need and the design of your streaming system you may find yourself in the situation where the producer (our collection tier) is generating messages faster then the consumer (our analysis tier) can consume them.
- Often times this is due to the fact that your analysis tier is more processing intensive then the collection tier and may not be able to process the data as fast. The presents an interesting challenge:
This is not a consumer problem at all, as it is perfectly acceptable in many use cases for consumers to read slowly or be offline from time to time.
- For example there may be consumers that only want to read all of the messages on an hourly basis to support a batch processing use case.
It is important to keep in mind that not all message-queueing systems provide this type of producer flow control, leaving it up to you, the application developer to control the rate at which your collection tier is producing messages.
- If you do not you may overwhelm your consumer and also overwhelm the broker.
The ability to support this requirement is provided by message-queuing products that support durable messaging.
3.2.3 Durable Messaging
If this situation would have a negative impact on your business and you cannot tolerate losing potentially days of data then you need to make sure the message queuing technology you choose has the ability to persist messages for long time.
- Figure 3.9 below shows how durable messaging fits in with this tier and some of the types you may find.
Having durable messages not only provides us with a degree of fault tolerance and thus disaster recovery, but also often times it allows for the offline consumer scenario we mentioned above.
If your business case requires now or in the future that you need to support offline consumers or want to make sure your producers and consumers can be completely decoupled then you need to remember to look for a product that supports durable messaging.
3.2.4 Message Delivery Semantics
The following are the three common semantic guarantees you will run into when looking at message queuing products:
- At most once – A message may get lost, but it will never be re-read by a consumer.
- At least once — A message will never be lost, however it may be re-read by a consumer.
- Exactly once — A message is never lost and is read by a consumer once and only once.
It seems like we have identified almost every spot in the diagram as a possible point of failure. Don't worry it is not all doom and gloom, let's walk through them and understand what the risks are and what each numbered item means.
- The network between the producer and broker
- Message Queue
- The network between the consumer and broker
In the context of a message queuing system we need to keep these failure scenarios in our back pocker so that when a messaging system claims to provide "exactly once" deliver semantics we can understand if it truly does.
- As is the case with so many thing the choice of the technology to use in this case will involve various tradeoffs such as in the table 3.1 below.
The choice of where to compromise is going to be based on the business problem you are trying to solve with the streaming system.
- However, if on the other hand you are building a streaming fraud detection system, then missing a message can have a very undesirable effect.
As you look at different messaging systems you may find that the messaging system you want to use does not provide "exactly once" guarantees, such as Apache Kafka and Apache ActiveMQ.
Figure 3.13 above identifies the producer and the consumer techniques.
- Do not retry to send messages — This is the first technique we must use, to do this you will need to have in place a way to track the messages your producer(s) send to a broker(s). If and when there is no response or a network connection is interrupted between your producer(s) and the broker(s) you can read data from the broker to verify that the message you did not receive an acknowledgment for was received. By having this type of message tracking in place you can be sure your producer only sends messages exactly once.
- Store metadata for last message — This is the second technique that we must use and involves us storing some data about the last message we read.
- If you are using a JMS based system you may store the HMSMessageID or if you using Apache Kafka you would store the message offset.
- In the end what you need is data about the message so that you can be sure that your consumer does not reprocess a message a second time.
If you do implement these two techniques, you would be able to guarantee "exactly once" messaging. You may not have noticed during this discussion, but by doing this you also get two nice little bonuses. There may be more, but the ones I was thinking of are message auditing and duplicate detection.
- From a message auditing standpoint, since you are already going to keep track of the messages your producer sends via metadata then on the consumer side you can use this same metadada to keep track of not just messages arriving, but also perhaps max, min, and average time it takes to process a message.
- Perhaps you can identify a slow producer or slow consumer.
- Regarding duplicate detection, we already decided that our producer was going to do the right thing to make sure a message was only sent to a broker one time, and on the consumer side we said it was going to check to see if a message has already been processed.
- One extra thing to keep in mind, in your consumer don't just keep track of metadata related to the messaging system, but also be sure to keep track of metadata that you can use to distinctly identify the payload of a message.
Up until now we have only been concerned with making sure we do not lose data, overwhelm the broker or consumers and understand how we can recover from a failure. This is all very good, because now we have a great understanding of how to provide a robust message-queuing tier. However, we have one more hurdle to cover, that being security. In this day and age not only is it important that we secure the data in-flight and at rest, but that we also can ensure that a producer is allowed to produce messages and a consumer is allowed to consume message.
Looking at figure 3.14 it may seem like we have a daunting desk ahead of us to secure the message-queueing tier. At a minimum I think you will need to think through these aspects and if you work with a security group it would be good to engage them as you undertake securing this and the other tiers.
3.4 Fault tolerance
Now that our collection and analysis tiers are isolated and we are sending messages through the message queuing tier, it is very important that we understand what happens to our data when things go wrong.
It is not a matter of "if" things go wrong, only a matter of time when something will go wrong.
If you think about the multiple data center architecture we have been talking about, how many places can you identify where we may lose data?
The network failing or being unavailable between data centers is a reality and often can be mitigated with a broker that uses durable storage. If the message queuing product that uses durable storage. If the message queuing product that meets your business needs does not uses a durable store, then you will need to either live with data loss or find another way to mitigate the risk of losing network connectivity.
The callout on the consumer side regarding the network failing while the analysis tier is consuming the message is something we will cover in chapter 4.
In the zoomed in view we can see that we have three brokers, the number here is not as important as what is going on between them and where things can fail.
Applying the core concepts to business problems
Now that we have covered the core concepts we need to keep in mind when thinking about the message queuing tier let's see if we can apply these to different business scenarios.