The Apache Thrift Framework can be organized into five layers:
* The RPC Server Library
* RPC Service Stubs
* User-Defined Type Serialization
* The Serialization Protocol Library
* The Transport Library
Applications requiring a common way to serialize data structures for storage or messaging may need nothing more than the bottom three layers of this model.
The top two layers the Apache Thrift library of RPC servers and the IDL compiler generated service stubs, adding RPC support to the stack.
Apache Thrift is conceptually an object oriented framework, though it supports object-oriented and non-object oriented languages. The Transport, Protocol, and Server libraries are often referred to as class libraries, though they may be implemented in other ways in non-object oriented languages. The classes within the Apache Thrift libraries are typically named with a leading capital T, for example, TTransport, TProtocol, and TServer.
At the bottom of the stack we have transports (see figure 2.2). The Apache Thrift transport library insulates the upper layers of Apache Thrift from device-specific details. ~In particular, transports enable protocols to read and write byte streams without knowledge of the underlying device.~
For example, imagine you developed a set of programs to move stock price quotations over the Sockets networking API. After the application is deployed, the requirements expand and you’re asked to add support for stock price transmission over an AMQP messaging system as well.
With Apache Thrift, the expanded capability will be fairly easy to implement. The new AMQP code can implement the existing Apache Thrift Transport interface, allowing the upper layer of code to use either the Socket solution or the AMQP solution without knowing the difference.
The modular nature of Apache Thrift transports allows them to be selected and changed at compile time or run time, giving applications plug-in support for a range of devices (see figure 2.4).
The Transport interface
::The Apache Thrift transport layer exposes a simple byte-oriented I/O interface to upper layers of code.:: The interface is typically defined in an abstract base class called TTransport. Table 22.1 describes the TTransport methods present in most language implementations. Each Apache Thrift language implementation has its own subtleties. Apache Thrift language libraries implementations tend to play the strengths of the language in question, making a level of variety across implementations the norm.
For example, certain languages define transport interfaces with additional methods for performance or other purposes. A case in point, the C++ language TTransport interface defines borrow() and consume() methods, which enable more efficient buffer processing. The examples here focus on the conceptual architecture of Apache Thrift.
End point transports
In this book we refer to Apache Thrift transports that write to a physical or logical device as “end point transports”. End point transports are always at the bottom of an Apache Thrift transport stack and most use cases require precisely one end point transport.
Apache Thrift languages supply end point transports for memory, file, and network devices.
* Memory oriented transports, such as TMemoryBuffer, are often used to collect multiple small write operations that are later transmitted as a single block.
* File-based transports, such as TSimpleFileTransport, are often used for logging and state persistence.
::The most important Apache Thrift Transport types are network oriented and used to support RPC operations.:: The most commonly used Apache Thrift network transport is TSocket. The TSocket transport uses the Socket API to transmit bytes over TCP/IP (see figure 2.5).
Other devices and networking protocols can be exposed though the TTransport interface as well. For example, many Apache Thrift language libraries provide HTTP transports to read and write using the HTTP protocol. Building a custom transport for an unsupported network protocol or device isn’t typically difficult, and doing so enables the entire framework to operate over the new end point type.
Because Apache Thrift transports are defined by the generic TTransport interface, client code is independent of the underlying transport implementation. This give transports the ability to overlay anything, even other transports. Layering allows generic transport behavior to be separated into interoperable and reusable components.
Imagine you’re building a banking application that makes calls to a service hosted by another company and you need to encrypt all the bytes traveling between your client and the RPC server. If you create a layered transport to provide the encryption, the client and server code could use your new encryption layer on top of the original network transport. The benefits of isolating this new encryption feature in a layered transport are several, no the least of which is that it can be inserted between the existing client code and old network transport with potentially no impact. The client code will see the encryption transport layer as another transport. The network end point transport will see the encryption transport as another client.
The encryption transport can be layered on top of any end point transport, allowing you to encrypt network I/O as well as file I/O and memory I/O. The layering approach allows the encryption concern to be separated from the device I/O concern.
In this book we refer to all Apache Thrift transports that aren’t end point transports as “layered transports.” Layered transports expose the standard Apache Thrift Transport interface to clients and depend on the Transport interface of the layer below. In this way one or more transport layers can be used from a transport stack.
A commonly used Apache Thrift layered transport is the framing transport. The transport is called TFramedTransport in most language libraries and ~it adds a four-byte message size as a prefix to each Apache Thrift message.~ This enables more efficient message processing in certain scenarios, allowing a receiver to read the frame size and then provide buffers of the exact size needed by the frame, for example.
NOTE Clients and servers must use compatible transport stacks to communicate. If the server is using a TSocket transport the client will need to use a TSocket transport. If the server is using a TFrameTransport layer on top of a TSocket, the client will have to use a TFramedTransport layer on top of a TSocket. Apache Thrift doesn’t have a built-in runtime transport or protocol discovery mechanism, though custom discovery systems can be crated on top of Apache Thrift.
Another important feature offered by layered transports is buffering. The TFramedTransport implicitly buffers writes until the flush() method is called, at which point the frame size and data are written to the layer below. The TBufferedTransport is an alternative to the TFramedTransport that can provide buffering when framing isn’t needed. Several languages build buffering into the end point solution and don’t provide a TBufferedTransport (Java is an example).
When two processes connect over a network to facilitate communications, the server must listen for clients to connect, accepting new connections as they arrive.
* ::The abstract interface for the server’s connection acceptor is usually named `TServerTransport`.::
* The most popular implementation of `TServerTransport` is `TServerSocket` used for TCP/IP networking. The server transport wires each new connection to a `TTransport` to handle the individual connection’s I/O.
Server transports follow the factory pattern with TServerSockets manufacturing TSockets, TServerPipes manufacturing TPipes, and so on.
Server transports typically have only four methods (see table 2.2). The listen() and close() methods prepare the server transport for use and shut it down, respectively. Clients cannot connect before listen() is invoked or after close() is invoked. The accept() method blocks until a client connection arrives.
::In the context of Apache Thrift, a protocol is a means for serialization types.:: Apache Thrift RPC doesn’t support every type defined in every language. Rather, the Apache Thrift type system includes all the important base types found in most languages (int, double, string, an so on), as well as a few heavily used and widely supported container types (map, set, list). All protocols must be capable of reading and writing all the types in the Apache Thrift type system.
Protocols sit on top of a transport stack (see figure 2.8). Labor is divided between ~the transport that’s responsible for manipulating bytes~ and ~the protocol that’s responsible for working with data types.~ Transports see only opaque byte stream; protocols turn data types into byte streams (see Figure 2.9).