The Unreasonable Effectiveness of Simple Binary Encoding (SBE)

Introduction

Simple Binary Encoding (SBE) is a codec format specified by the Financial Information eXchange (FIX) Trading Community for use in high-performance trading systems. It lives in Layer 6 (Presentation Layer) of the OSI model, and deals with how trading messages between different systems are represented on the wire. Below, we’ll see why it may make sense to leverage its compactness and performance in non-financial settings as well, provided its obscurity and some quirks are deemed tolerable for the use-case.

Why yet another codec?

For a lot of systems, passing around JSON data is perfectly suitable. It is self-describing, human-readable, and perhaps above all – has the benefit of being easy to integrate with almost anywhere. It is the lingua franca of interprocess communication nowadays, and one would be hard pressed to find a language or a library that doesn’t support it.

In some respects though, JSON is not as efficient as one would want a codec to be.

{
    "students": [
        {
            "name": "First Last",
            ...
        },
        {
            "name": "First1 Last1"
        }
        ...
    ]
    ...
}

Notice in the above snippet how the key name is repeated for every student element in the students array. If we already know the schema of the student model, there’s no need to repeatedly have the key=value structure and, moreover, use that for over the wire transmission. Imagine how odd it’d be if we referenced the part of speech of each word we spoke – something like Subject:I Verb:have Article:an Object:apple. The density of information we’d pass on to each other via spech would be so much smaller,
and we would be wasting precious time and brain cycles speaking like that.

Instead, each language has some grammar (schema, in our case) that’s learned ahead of time. And so when the time comes to decode some prose or speech, the receiver can infer the grammar by the arrangement of the different words. Of course, this leads to some ambiguity in natural languages like English – which we should have no room for in computing. In similar vein, SBE schemas are shared off-channel and the data on the wire is much more compact because there’s no key=value structure, just a series of value1|value2|.... This leads to challenges of its own, such as:

How do we know where value1 ends and value2 begins?
What if there’s a change in schema? and many others.

So what?

We looked at how JSON repeats the key=value structure, which makes the codec human readable but is very inefficient. And we also introduced the notion of a schema that’s shared with the decoder separately from the data it needs to decode. The schema lays out the ordering and lengths of the different values that the decoder may encounter in a payload, and that’s how something like value1|value2|value3|... could be understood by the receiving system. The compact binary layout of SBE enables it to have desirable characteristics for low-latency and high-throughput applications, reducing bandwidth, compute, and memory utilization.

Advantages

Simple native binary types reduce complexity and improve performance during (de)serialization
Messages are compactly packed since keys (field names) are not sent over the wire
Off-channel schema exchange reduces overhead (only messages are sent over the wire, not their schema)
Strict schema versioning
Enables copy- and allocation-free encoding/decoding

Disadvantages

Fields in the schema cannot be reordered
Variable length fields (such as strings) must follow fixed-length fields
Payloads are not self-describing, and thus require a decoder to be human-readable
SBE is largely confined to high performance finance/trading applications

Introduction

Why yet another codec?

So what?

Advantages

Disadvantages

Resources