In set theory there is a convenient notation for defining new sets, called set comprehension. For example when we have a set we can define a collection of singleton sets with elements from the set as . Sometimes vertical bar is used instead of the colon, and in Isabelle/ZF a single dot is used (something like parses successfully in Isabelle/ZF). In some programming languages a similar notation is used for lists. For example in Python one can write

[ [x] for x in [1,2,3] ]

to get a list of singleton lists and in Haskell

[ [x] | x <- [1,2,3] ]

gives the same.

Erlang also has a syntax for list comprehension, very similar to Haskell’s:

[ [X] || X <- [1,2,3] ]

The part `X <- [1,2,3]`

above is called *the generator* expression.

Erlang has also something unique (as far as I know): binary comprehensions. This is a concept similar to the list comprehensions, but the dummy variable bound by the notation (x in the examples above) can range over binaries rather than lists. I found this very convenient when I was implementing Erlang interface to the KDB+ IPC protocol.

The documentation for Erlang binary comprehensions is rather sparse. There is a chapter on it in the *Learn you some Erlang* book, but it’s rather short. The author sends the reader to a white paper by Per Gustafsson and Konstantinos Sagonas for details. The problem is that the white paper describes a proposal for syntax rather than what was actually implemented. As a result none of the examples of the “Examples of binary comprehensions” section in that paper are parsed by the Erlang version I am using (` erlang:system_info(version)`

returns “6.1” for me). This post aims at providing some examples of usage of binary comprehensions, in addition to the ones provided in the *Learn you some Erlang* book.

## Lists from binaries

The simplest example is

1> [ X || <<X>> <= <<1,2,3>> ]. [1,2,3]

This is the same as what

binary:bin_to_list(<<1,2,3>>).

does.

The ` <<X>> `

syntax always goes together with the ` <= `

, so if you use the ` <= `

token to indicate you want to range over a binary, you always need to surround your dummy variable (or a pattern, see below) with ` <<>> `

.

Now suppose we want to interpret a binary as representing a list of 4-byte integers, written in big endian. Then, instead of the variable `X`

we write a pattern for matching a single integer.

2> [ X || <<X:32/big-signed-integer>> <= <<0,0,0,1,0,0,0,2>>]. [1,2]

If the (bit) length the of binary is not a multiple of the length of the pattern, the largest possible multiple is taken

3> [ X || <<X:32/big-signed-integer>> <= <<0,0,0,1,0,0,0,2,0,0>>]. [1,2]

An interesting thing about the bit syntax in Erlang is that the size in the bit segment specification does not have to be a literal constant – it can be a variable or an expression. So for example one can write a function for interpreting a binary as a sequence of integers of different lengths. Below we parse the same binary as sequence of 32-bit integers or as a sequence of 64-bit integers using the same function.

4> F = fun(Length,Bin) -> [ X || <<X:Length/big-signed-integer>> <= Bin ] end. #Fun<erl_eval.12.90072148> 5> F(32,<<0,0,0,0,0,0,0,5>>). [0,5] 6> F(64,<<0,0,0,0,0,0,0,5>>). [5]

It seems that none of the other elements of the bit segment specification can treated this way. For example we can not specify endianess in a variable

7> G = fun(Endianess,Bin) -> [ X || <<X:32/(Endianess)-signed-integer>> <= Bin ] end. * 1: syntax error before: '('

We can use any segment with known size to specify the dummy variable(s) in the generator. For example suppose we have a binary that we want to interpret as a sequence of 5-byte structures. Each structure starts with a byte we don’t care about and the rest is a 4-byte integer. We can write the redundant byte in the pattern as an underscore:

7> [ X || <<_,X:32/big-signed-integer>> <= <<5,0,0,0,1,6,0,0,0,2>> ]. [1,2]

If we do need that byte we can bind a variable to it and return a list of pairs rather than discard those additional bytes:

8> [ {X,C} || <<C,X:32/big-signed-integer>> <= <<5,0,0,0,1,6,0,0,0,2>> ]. [{1,5},{2,6}]

## Binaries from lists

The simplest example of this case is

9> << <<X>> || X <- [1,2,3] >>. <<1,2,3>>

This is the same as calling `binary:list_to_bin([1,2,3])`

. Note that if we use the binary delimiters (<<>>) to indicate a comprehension that results in a binary, then we have to use those delimiters around the `X`

variable (or the expression we put there) as well. This is similar to the situation where we have to surround `<<X>>`

whenever we indicate with `<=`

that we want our variable to range over a binary. To me this syntax seems a bit noisy since we have to repeat the same information twice.

Suppose we want to convert a list of integers to a binary that contains each of them in the 4-byte, little endian format. We can write this as follows:

10> << <<X:32/little-signed-integer>> || X <- [1,2,3] >>. <<1,0,0,0,2,0,0,0,3,0,0,0>>

One can use this type of binary comprehensions to concatenate a list of binaries:

11> << <<X/binary>> || X <- [<<1,2>>,<<3>>] >>. <<1,2,3>>

We can use pattern matching in the generator (just as in the list comprehensions) to define the dummy variables. For example if we have a list of pairs of integers and we want to encode this list by encoding each pair as 8-byte binary with 4-byte encodings of the elements of the pair, then we can do this:

12> << <<A:32/big-signed-integer,B:32/big-signed-integer>> || {A,B} <- [{1,2},{3,4}] >>. <<0,0,0,1,0,0,0,2,0,0,0,3,0,0,0,4>>

Now for a little bit more complicated example suppose we have a list of binaries representing some packets. We want to discard packets that start from the 0 byte and swap the first two bytes in the remaining ones, then concatenate the results and get one binary. Using binary comprehensions we can do this as follows:

13> Packets = [<<1,2,3,4>>,<<0,1,2>>,<<5,6,7>>]. [<<1,2,3,4>>,<<0,1,2>>,<<5,6,7>>] 14> << <<B1,B0,Rest/binary>> || <<B0,B1,Rest/binary>> <- Packets, B0 =/= 0>>. <<2,1,3,4,6,5,7>>

Here the `<<B0,B1,Rest/binary>>`

part defines variables `B0,B1,Rest`

(integer,integer,binary, resp.) and pattern matches to the first byte, second byte and the rest for each packet. The expression `B0 =/= 0`

acts as a filter discarding the packets with the first byte zero. The `<<B1,B0,Rest/binary>>`

expression does the swapping.

Another way to do the same thing is to write

15> << <<B1/binary,B0/binary,Rest/binary>> || <<B0:1/binary,B1:1/binary,Rest/binary>> <- Packets, B0 =/= <<0>> >>. <<2,1,3,4,6,5,7>>

Here the variables `B0`

and `B1`

are bound to 1-byte binaries rather than integers, so (maybe) we avoid converting bytes to integers. I am curious if this is faster.

## Binaries from binaries

The last case is where the generator ranges over a binary and the result is also concatenated into a binary. The simplest example is just the identity:

16> << <<X>> || <<X>> <= <<1,2,3>> >>. <<1,2,3>>

Here is how we can convert encoding of 4-byte little endian integers into 8-byte big endian integers

17> << <<X:64/big-signed-integer>> || <<X:32/little-signed-integer>> <= <<1,0,0,0,2,0,0,0>> >>. <<0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2>>

Tags: Erlang

## Leave a Reply