I made a better struct alternative in Python
Introducing PackMan, a type-safe struct alternative for binary packing/unpacking for Python
Find this project on GitHub
As a Python developer, you've likely encountered the need to work with binary data at some point in your career. Handling binary data is a crucial skill, whether you're parsing network protocols, working with file formats, or interfacing with hardware devices. For years, Python's built-in struct module has been the go-to solution for this task. But as our codebases grow more complex and our need for robustness increases, is it time for something better?
The Good Old struct
Module
Let's start with a trip down memory lane. Python's struct
module has been a faithful companion to developers for years. It's straightforward, built into the standard library, and gets the job done. Here's a simple example:
import struct
data = b"\x01\x02"
a, b = struct.unpack("BB", data)
print(a, b) # Output: 1 2
In this example, we're unpacking two unsigned bytes from our binary data. Simple, right? For basic use cases, struct
works wonderfully. It's fast, it's reliable, and it's been battle-tested over the years.
The Growing Pains
As Python has evolved, so have our expectations for code quality and maintainability. With the introduction of type hints in Python 3.5, we gained powerful tools for catching errors early and improving code readability. However, struct
predates this era, and its API doesn't take advantage of these modern Python features.
Consider this slightly more complex example:
As you can see in the screenshot, the type inference for length
, text
, text_str
are all Any
, this means the actual type can only be determined during runtime unless you manually type-hint each variable.
The problem with Any
type is you won't get any editor completion when trying to access a method like decode
:
You won't get any error message if you are using decode
incorrectly:
Introducing PackMan
Born out of frustration with struct's limitations, PackMan aims to bring binary data handling in Python into the modern era. Let's look at how PackMan handles our previous example:
With PackMan, all unpacked values have static types.
This means better code completion:
And static type checker errors if there's a type mismatch:
PackMan also supports a very expressive syntax to define binary formats:
In this example, we defined our format in a way that clearly expresses the structure of our data. We're saying "there's an unsigned 8-bit integer, followed by a number of bytes determined by that integer." This is not only more readable but also provides crucial type information.
The statement is very readable, as you can directly infer what data will be unpacked from the names: u8
(int) and nbytes
(bytes).
A Real-World Example: Parsing BLE Advertisement Structures
To really appreciate the power and elegance of PackMan, let's look at a real-world example: parsing Bluetooth Low Energy (BLE) advertisement structures. BLE devices broadcast small packets of data to announce their presence and capabilities. These packets have a specific format that we can parse using PackMan.
Here's the code:
from collections.abc import Iterator
from packman import u8
from packman.formats import nbytes
def ble_ad_structs(data: bytes) -> Iterator[tuple[int, int, bytes]]:
while data:
length, data = u8.unpack(data).flatten()
if length == 0:
break
dtype, payload, data = (u8 + nbytes(length - 1)).unpack(data).flatten()
yield length, dtype, payload
for length, dtype, payload in ble_ad_structs(b"\x03\x01\x02\x03" + b"\x05\xff\x01\x02\x03\x04"):
print(f"Length: {length}, Type: {dtype}, Payload: {payload.hex()}")
Let's break this down:
- We define a function
ble_ad_structs
that takes a bytes object as input and returns an iterator of tuples. - Inside the function, we use a while loop to process the data until it's empty (this may not always be the case, we will talk about exceptions later).
- For each iteration:
- We unpack a single byte (
u8
) to get the length of the current structure. - If the length is 0, we break the loop (BLE advertisements often contain trailing zeros).
- We then unpack the type (another
u8
) and the payload (usingnbytes(length - 1)
). - We yield a tuple of (length, type, payload) for each structure.
- We unpack a single byte (
- In the main code, we iterate over the structures returned by
ble_ad_structs
, printing out the details of each.
The sample data b"\x03\x01\x02\x03" + b"\x05\xff\x01\x02\x03\x04"
represents two BLE advertisement structures:
- The first has a length of 3, type 1, and payload
\x02\x03
. - The second has a length of 5, type 255, and payload
\x01\x02\x03\x04
.
When we run this code, we get:
Length: 3, Type: 1, Payload: 0203
Length: 5, Type: 255, Payload: 01020304
This example showcases several strengths of PackMan:
- Type Safety: The function signature clearly indicates what types we're working with.
- Readability: The parsing logic is clear and concise.
- Flexibility: We can easily handle variable-length data structures.
- Performance: We're processing the data in a streaming fashion, which is memory-efficient for large inputs.
With PackMan, parsing complex binary protocols becomes a straightforward task. Whether you're working with network protocols, file formats, or, as in this case, IoT device communications, PackMan provides the tools to make your code both powerful and readable.