Wednesday, June 8, 2022

Zigzag bytes

I was playing around with a goofy idea for which I needed zigzag encoding for bytes. Zigzag encoding is often used in combination with variable length encoding in things like Avro, Thrift and Protobuf.

In zigzag encoded integers, the least significant bit is used for sign. To convert from regular encoding (2-complement) to zigzag (and back) you can use the following Scala code:

def i32ToZigZag(n: Int): Int = (n << 1) ^ (n >> 31) def zigZagToI32(n: Int): Int = (n >>> 1) ^ - (n & 1) def i64ToZigZag(n: Long): Long = (n << 1) ^ (n >> 63) def zigZagToI64(n: Long): Long = (n >>> 1) ^ - (n & 1)

Translate this to Java and the expressions after the = look exactly the same.

Using these bit shifting tricks for bytes is a whole lot more difficult. The problem is that Scala (like Java) does not support bit operations on Bytes. They always convert them to an Int first.

After a lot of fiddling, I settled on the following:

private def b(i: Int): Byte = (i & 0xff).toByte def i8ToZigZag(n: Byte): Byte = (b(n << 1) ^ (n >> 7)).toByte def zigZagToI8(n: Byte): Byte = b(((n & 0xff) >>> 1) ^ (256 - (n & 1)))

Translated to Java it should look like this (not tested!):

private byte b(int i) { return (byte)(i & 0xff); } public byte i8ToZigZag(byte n) { return (byte)(b(n << 1) ^ (n >> 7)); } public byte zigZagToI8(byte n) { return b(((n & 0xff) >>> 1) ^ (256 - (n & 1))); }

Is there a better way to do this?