I made a better struct alternative in Python

Introducing PackMan, a type-safe struct alternative for binary packing/unpacking for Python

I made a better struct alternative in Python

Find this project on GitHub

As a Python developer, you've likely encountered the need to work with binary data at some point in your career. Handling binary data is a crucial skill, whether you're parsing network protocols, working with file formats, or interfacing with hardware devices. For years, Python's built-in struct module has been the go-to solution for this task. But as our codebases grow more complex and our need for robustness increases, is it time for something better?

The Good Old struct Module

Let's start with a trip down memory lane. Python's struct module has been a faithful companion to developers for years. It's straightforward, built into the standard library, and gets the job done. Here's a simple example:

import struct

data = b"\x01\x02"
a, b = struct.unpack("BB", data)
print(a, b)  # Output: 1 2

In this example, we're unpacking two unsigned bytes from our binary data. Simple, right? For basic use cases, struct works wonderfully. It's fast, it's reliable, and it's been battle-tested over the years.

The Growing Pains

As Python has evolved, so have our expectations for code quality and maintainability. With the introduction of type hints in Python 3.5, we gained powerful tools for catching errors early and improving code readability. However, struct predates this era, and its API doesn't take advantage of these modern Python features.

Consider this slightly more complex example:

image

As you can see in the screenshot, the type inference for length, text, text_str are all Any, this means the actual type can only be determined during runtime unless you manually type-hint each variable.

The problem with Any type is you won't get any editor completion when trying to access a method like decode:

image

You won't get any error message if you are using decode incorrectly:

image

Introducing PackMan

Born out of frustration with struct's limitations, PackMan aims to bring binary data handling in Python into the modern era. Let's look at how PackMan handles our previous example:

image

With PackMan, all unpacked values have static types.

This means better code completion:

image

And static type checker errors if there's a type mismatch:

image

PackMan also supports a very expressive syntax to define binary formats:

image

In this example, we defined our format in a way that clearly expresses the structure of our data. We're saying "there's an unsigned 8-bit integer, followed by a number of bytes determined by that integer." This is not only more readable but also provides crucial type information.

The statement is very readable, as you can directly infer what data will be unpacked from the names: u8 (int) and nbytes (bytes).

A Real-World Example: Parsing BLE Advertisement Structures

To really appreciate the power and elegance of PackMan, let's look at a real-world example: parsing Bluetooth Low Energy (BLE) advertisement structures. BLE devices broadcast small packets of data to announce their presence and capabilities. These packets have a specific format that we can parse using PackMan.

Here's the code:

from collections.abc import Iterator

from packman import u8
from packman.formats import nbytes


def ble_ad_structs(data: bytes) -> Iterator[tuple[int, int, bytes]]:
    while data:
        length, data = u8.unpack(data).flatten()
        if length == 0:
            break
        dtype, payload, data = (u8 + nbytes(length - 1)).unpack(data).flatten()
        yield length, dtype, payload


for length, dtype, payload in ble_ad_structs(b"\x03\x01\x02\x03" + b"\x05\xff\x01\x02\x03\x04"):
    print(f"Length: {length}, Type: {dtype}, Payload: {payload.hex()}")

Let's break this down:

  1. We define a function ble_ad_structs that takes a bytes object as input and returns an iterator of tuples.
  2. Inside the function, we use a while loop to process the data until it's empty (this may not always be the case, we will talk about exceptions later).
  3. For each iteration:
    • We unpack a single byte (u8) to get the length of the current structure.
    • If the length is 0, we break the loop (BLE advertisements often contain trailing zeros).
    • We then unpack the type (another u8) and the payload (using nbytes(length - 1)).
    • We yield a tuple of (length, type, payload) for each structure.
  4. In the main code, we iterate over the structures returned by ble_ad_structs, printing out the details of each.

The sample data b"\x03\x01\x02\x03" + b"\x05\xff\x01\x02\x03\x04" represents two BLE advertisement structures:

  • The first has a length of 3, type 1, and payload \x02\x03.
  • The second has a length of 5, type 255, and payload \x01\x02\x03\x04.

When we run this code, we get:

Length: 3, Type: 1, Payload: 0203
Length: 5, Type: 255, Payload: 01020304

This example showcases several strengths of PackMan:

  1. Type Safety: The function signature clearly indicates what types we're working with.
  2. Readability: The parsing logic is clear and concise.
  3. Flexibility: We can easily handle variable-length data structures.
  4. Performance: We're processing the data in a streaming fashion, which is memory-efficient for large inputs.

With PackMan, parsing complex binary protocols becomes a straightforward task. Whether you're working with network protocols, file formats, or, as in this case, IoT device communications, PackMan provides the tools to make your code both powerful and readable.