Data Lake for Enterprises
上QQ阅读APP看书,第一时间看更新

Messaging Layer - guaranteed data delivery

The messaging layer would form the Message Oriented Middleware (MOM) for the data lake architecture, and hence would be the primary layer for decoupling the various layers with each other, but with guaranteed delivery of messages.

In order to ensure that the delivery of messages is guaranteed, the messages will need to be persistent. This persistence of messages is usually done on a storage drive. The storage drive used for persisting these messages should be fit for the purpose based on number and size of the messages to be stored. Fundamentally, since the nature of message oriented middleware is to queue up the messages, for both writes and reads, this fits well into the characteristics of  serial access (writes and reads), for which spinning disks may be adequate. However, for a very large scale application  with millions of messages streamed per second, SSD could provide better IO rates.

The other aspect of a messaging layer is its ability to enqueue and dequeue messages, as is the case with most messaging frameworks. Most messaging frameworks provide enqueue and dequeue mechanisms to manage publishing and consumption of messages respectively. Every messaging framework provides its own set of libraries to connect to its resources (queues/topics).

Figure 05: Message queue

Any message-oriented middleware generally supports two types of communication with queue and topic messaging structures. They are as follows:

  • Queues are mostly used for point-to-point communication, with every message consumed only once by one of the consumers
  • Topics are mostly used for publish/subscribe mechanisms, wherein a message is published once but is consumed by multiple subscribers (consumers). Hence a message is consumed multiple times, once by every consumer. Internally, topics are based on queues; however, these internal queues are managed differently by the messaging engine to provide a publish/subscribe mechanism.

Both queues and topics can be configured to be non-persistent or persistent. For the purpose of guaranteed delivery, it is imperative to have persistent queues such that messages are never lost.

At a high level, the message-oriented middleware can be abstracted with components such as message broker, message store, and queues/topics with a messaging framework/engine.

Figure 06: A messaging framework

Shown here are the high-level components of a messaging framework. Please note, the details have been abstracted to provide a simplified view. These components will be discussed in greater detail in Chapter 7, Messaging Layer using Apache Kafka later in this book.