Shrink your data into bitfields (and out again)

Published: 2020-04-29


Some applications want to save every byte. Every last byte! Maybe they have memory constraints, maybe they’re sending data over the network, maybe they just like saving bytes.

One way to help save bytes is to use efficient data structures. One popular memory efficient data structure is a bitfield. A bitfield allows complex, predefined data to be stored in a compact series of bits.

We’ll take some data from this:

  "priority": "urgent",
  "location": "Room 71",
  "status": "empty",
  "candy": "peppermint patties"

To this! The same data!


(That base-2 number is 7288 in base-10).

We’ll do that in a series of steps and using less data with each one.

  "priority": "urgent",
  "location": "Room 71",
  "status": "empty",
  "candy": "peppermint patties"

How? Incrementally!

Starting with the data

Let’s say we have some data

field value
priority urgent
location Room 71
status empty
candy peppermint patties

That’s some good data. Now let’s say we need to represent that data in code. A hash should do nicely.

data = {
  "priority" => "urgent",
  "location" => "Room 71",
  "status" => "empty",
  "candy" => "peppermint patties"

That’s some nice data! But if we serialize it to JSON we have 88 bytes!

data.to_json.size # => 88

What can we do to use less bytes?

Give up the keys

If we agree on the keys ahead of time we can stop sending those around. That means we need to fix the data into an order and coordinate that among all systems using that data. If we’re ok with that then that’s big step!

smaller_data = [
  "Room 71",
  "peppermint patties",
smaller_data.to_json.size # => 49

Wooo, 49 bytes! We’ve already gotten it down to 55% of the original size. We gave up some flexibility because we now we have a fixed set of keys in a fixed order but we saved a lot of space!

We can go even further if we’re willing to give up still more flexibility.

Limit the possible values

If we can give up having arbitrary values then we can much more efficently represent those values.

Take the “candy” for example. Right now it could be any string, any string at all!

Candy Bytes to represent
peppermint patties 18
m&ms 4
reese’s pieces 14
butterfingers 13
cookies 7

But if we know we have a limited set of possible candies then we don’t need the ability to have any string. For example, if we have ten or fewer possible candies we can use a single integer digit to represent them all.

Candy Number Bytes
peppermint patties 0 1
m&ms 1 1
reese’s pieces 2 1
butterfingers 3 1
cookies 4 1

That means we if replace “peppermint patties” in our data with 0 (our arbitrary agreed upon value to represent that candy) then we can cut our byte usage down a bit.

even_smaller_data = [
  "Room 71",

even_smaller_data.to_json.size # => 30

Now we’re down to 34% of the original size with that optimization alone!

Let’s squish down the other data by limiting their options.

priority = %w(low medium high urgent)

location = (1..100) # i.e. Room 1 to Room 100

status = ["not empty", "empty"]

candy = [
  "peppermint patties",
  "reese's pieces",

With those contraints our original data becomes much smaller still.

even_smaller_still_data = [3, 71, 1, 0]
even_smaller_still_data.to_json.size # => 10

10 bytes! We’re down to around 11% of the size of the original data!

At this point we need code that has explicit knowledge of the data structure to serialize the data down and parse it back out.

module IntegerArray
  PRIORITY = %w(low medium high urgent)

  LOCATION = (1..100)

  STATUS = ["not empty", "empty"]

  CANDY = [
    "peppermint patties",
    "reese's pieces",

  def self.serialize(data)

  def self.parse(array)

  class Serialize
    def initialize(data)
      @data = data

    def priority

    def location

    def status

    def candy

    def to_a
      [priority, location, status, candy]

  class Parse
    def initialize(array)
      @array = array

    def priority

    def location
      room_number = @array[1]
      "Room #{room_number}"

    def status

    def candy

    def to_hash
        "priority" => priority,
        "location" => location,
        "status" => status,
        "candy" => candy,
data = {
  "location"=>"Room 71",
  "candy"=>"peppermint patties"

serialized = IntegerArray.serialize(data)
# => [3, 71, 1, 0]

parsed = IntegerArray.parse(serialized)
# => {
# =>   "priority"=>"urgent",
# =>   "location"=>"Room 71",
# =>   "status"=>"empty",
# =>   "candy"=>"peppermint patties"
# => }
other_data = {
  "location"=>"Room 23",
  "status"=>"not empty",

serialized = IntegerArray.serialize(data)
# => [0, 23, 0, 1]

parsed = IntegerArray.parse(serialized)
# => {
# =>   "priority"=>"low",
# =>   "location"=>"Room 23",
# =>   "status"=>"not empty",
# =>   "candy"=>"m&ms"
# => }

That’s pretty efficient, but since we’re already paying the cost to have custom code for serializing and parsing the data we can go even further! Our data still has some lurking flexibility that we could trade for a decrease in size. The data values themselves!

Push all the values into a bitfield

Consider that status field. We know it can only have two possible values: empty or not empty. But we’re using a data value that could hold many many more numbers than a fixed set of two options. If we know there can be only two then we can limit ourselves to only using 0 or 1 and then we have information that could be stored entirely in a single bit. But how can we store the priorities or all the types of candy or all the possible room locations as bits?

The answer is to use the fewest number of bits that can hold the range of possible options.

Let’s illuminate that with the “priority” field. The priority field has four options.


We can’t represent those options with a single 0 or 1 bit. But we can represent them all with TWO bits.

Priority Integer Bits
low 0 00
medium 1 01
high 2 10
urgent 3 11

It’s the same data, but restricted down to the absolutely smallest and least flexible space.

The number of bits you need for a field is enough to represent the power of two large enough to count all the values.

Let’s say we want a bitfield to hold days of the month. There are 31 possible values from 1 to 31. Let’s see how many bits we’d need to hold that data.

Number of Bits Available Values Range
11 22 010\ldots1
22 44 030\ldots3
33 88 070\ldots7
44 1616 0150\ldots15
55 3232 0310\ldots31
66 6464 0630\ldots63
77 128128 01270\ldots127
88 256256 02550\ldots255

Hey there we go! Using 5 bits gives us a bucket of 32 spaces which is just a little more than we need to hold any of the 31 possible days in a month.

Let’s take a look at the bits the rest of our data fields would require.

Data Field Possible Values Bits Required
priority 4 2
location 100 7
status 2 1
candy 5 3

Great! What would it look like to represent our sample data in these bits?

module BitsArray
  def self.serialize(data)
    IntegerArray.serialize(data).map do |value|

  def self.parse(array)
    integer_array = { |v| v.to_i(2) }
data = {
  "location"=>"Room 71",
  "candy"=>"peppermint patties"

serialized = BitsArray.serialize(data)
# => ["11", "1000111", "1", "0"]

parsed = BitsArray.parse(serialized)
# => {
# =>   "priority"=>"urgent",
# =>   "location"=>"Room 71",
# =>   "status"=>"empty",
# =>   "candy"=>"peppermint patties"
# => }

Hey it works! In our example code we can see that Ruby has really great support for changing number bases. That’s a fundamental part of what we’re doing, but we’re not done yet.

Right now you may be thinking to yourself, “Self, serializing something like 10 to JSON would be twice the data needed to serialize the single digit 2. How can that be a savings?” And you are right on! If we were to represent our bits with one digit per bit in JSON that would be much too large! Instead we’ll push all of our data together into a bitfield and represent that as JSON.

Currently we’ve gotten our data to an array of base-2 numbers instead of array of base-10 numbers. So yeah that’s not a size reduction! Not at all!

serialized_base10 = [3, 71, 1, 0]
serialized_base2 = ["11", "1000111", "1", "0"]

serialized_base10.to_json.size # => 10
serialized_base2.to_json.size # => 24

How is this going to save us any space? We’re gonna squish those bits! Instead of keeping all of those bits apart we’ll push them right next to each other and treat that as a single number.

Ultimately for this specific example we want to get this binary number:


Made up of these components

Priority Location Status Candy
1111 10001111000111 11 000000

Would that save us space? Yes! Because we can represent that long string of bits as an integer of whatever base we like!

0b1110001111000.to_s(2)  # => "1110001111000"
0b1110001111000.to_s(8)  # => "16170"
0b1110001111000.to_s(10) # => "7288"
0b1110001111000.to_s(12) # => "4274"
0b1110001111000.to_s(16) # => "1c78"

Sticking with base-10 for simplicity and all of our original data would be represented as 7288. That’s just 4 bytes as JSON! 4.5% of the original 88 bytes!

Constructing a bitfield

It might seem like we could simply take our individual integer values we’ve derived, convert them each to base-2, and then join them together into a string that will be our base-2 bitfield.

As a quick review we’ve gotten our data values into base-10 integers:

data = {
  "location"=>"Room 71",
  "candy"=>"peppermint patties"

serialized = IntegerArray.serialize(data)
# => [3, 71, 1, 0]

Let’s convert them individually to base-2

Field Base-10 Base-2
Priority 33 1111
Location 7171 10001111000111
Status 11 11
Candy 00 00

Here you might spot the problem with a simple conversion of each integer to base-2 and joining them into a string. But if you don’t you’re in good company because the problem is not obvious nor intuitive.

Check out that last 0. Our candy field is supposed to have three bits of data. It’s only shown as a single 0 in our results so far because 0b0 is the same as 0b00 and 0b000 just like 000 is 00 and 0: they’re all simply zero! That was fine when our bits were separated into an array, but it’s not going to work at all when our bits are all going to be part of the same number.

Let’s see that problem directly:

oops = ["11", "1000111", "1", "0"].join
p oops # => "11100011110"

# convert to base-10
p oops.to_i(2) # => 1822

bitfield = ["11", "1000111", "1", "000"].join
p bitfield # => "1110001111000"

# convert to base-10
p bitfield.to_i(2) # => 7288

7288 is NOT 1822! Let’s compare what happens when we convert those back to base-2

1822.to_s(2) # => "11100011110"
7288.to_s(2) # => "1110001111000"

That might look like no big deal, it’s just the right side showing up as 0 instead of 000 right? Well yes, but that’s a huge deal! Like the number 23 is very different than the number 2300: those zeros on the right are very important!

Could we solve the problem by simply padding our number to 13 bits? It might seem that way.

# the bitfield is supposed to be 13 bits
#   2 for priority
# + 7 for location
# + 1 for status
# + 3 for candy
1822.to_s(2).ljust(13, '0') # => "1110001111000"
7288.to_s(2).ljust(13, '0') # => "1110001111000"

But no, we can’t simply solve the issue by padding the entire string. Padding the number with zeros only works in this case because the 0 value happens in the last field. If the 0 were in multi-bit field in the middle of the bitfield then no amount of left or right padding would fix the problem.

We could pad each individual integer to its known bit length and then join the results into a string that we could convert into base-2. But that would be hacking our way around the problem.

Instead of hacking our way around with padding, we must properly construct our bitfield. That means putting in a little more thought than simply converting each individual field to base-2 and joining them.

Fair warning, I’m going to ramble around the following explanations in various ways. If you feel like you’ve got it, skim along. If you don’t then hopefully one of the explanations helps get you there.

What is a bitfield?

A bitfield is, ultimately, a number. It’s not special, not magical. We make it special by assigning significance to specific divisors that make up the number. In a bitfield’s case we assign significance to the powers of 2 that make up the number.

Why powers of 2? Because that’s what computers are good at. Ultimately every bit is represented as either a 1 or 0. A bit is the smallest component of memory we can manipulate. So if we want our data to be as small as possible that means using the smallest number of bits that can represent the individual pieces of data we want to reference.

Since we want to represent more data than a single bit can hold and since we want to keep all of our data together that means we have to place all of our bits into a single number. That means every single bit and its position in the number is very important. As a series of numbers 101110101110 might look similar to 10111001011100 but it is quite literally the difference between 4646 and 9292.

As a quick level set on number “places” in base-10 vs base-2 let’s look at what the number 1000 represents in base-2 vs base-10.

1 0 0 0
Base-10 Place 1000 100 10 1
Base-2 Place 8 4 2 1

In other words, each new digit represents another power of the base.

N 10N10^N 2N2^N
0 1 1
1 10 2
2 100 4
3 1000 8
4 10000 16

As we add data to our bitfield we’ll be adding more and more powers of base-2.

bitfield length possible values in base-10
1 010\ldots1
2 030\ldots3
3 070\ldots7
4 0150\ldots15

Range of base-10 values: 0...31

Number of possible values: 32

Example bitfield values

10100 = 20
11001 = 25
00110 = 6

Constructing a bitfield by hand

Ok, so let’s construct OUR bitfield.

Field Bits
priority 2
location 7
status 1
candy 3

We have four fields: priority, location, status, candy. And we know the number of bits we need for each. Using that we can figure out how to turn each of our field values into their position in the bitfield.

To convert each value to its part of our bitfield number we need to determine what units of two it represents. That is: what’s the smallest “place” of its part of the bitfield? It’s a weird concept to explain so let’s do it instead and then circle back to try and put it all together.

We’ll build our bitfield from smallest to largest numbers. That is we’ll create it right to left: starting with the smallest number places of the bitfield and finishing with the largest number places.

First we have candy. Since it’s the first field its unit is 1. But our value (peppermint patties) is also 0 so we end up with a value of 0 going into the bitfield. It’s three bits of zero, but that’s not information we can encode in the zero itself.

0(1)=00(1) = 0

Next we have status. Our value (empty) is represented as 1 in our design. BUT since candy takes three bits to hold its data our status value of 1 needs to use 8 for its units. The 1s, 2s, and 4s places are all reserved for candy data.

0(1)+1(8)=80(1) + 1(8) = 8

After status is location. Our value (Room 71) will be represented as itself 71 in the data. But instead of being the value 71 directly it will be 71 times the power of 2 where its field belongs. The location data starts off right next to status since status only takes one bit for its data. That means all of the location data will be in units of 16.

0(1)+1(8)+71(16)=11440(1) + 1(8) + 71(16) = 1144

Finally we have priority. Since location took a whopping seven bits of data priority is pushed way up into units of 2048. Our data is “urgent” which we’ve represented as the value 3.

0(1)+1(8)+71(16)+3(2048)=72880(1) + 1(8) + 71(16) + 3(2048) = 7288

So our bitfield holding our sample data is the value 72887288. Is that what we wanted? Well let’s see!

7288.to_s(2) # => 1110001111000

Let’s break that down left to right (priority, locaiton, status, candy)

  • 11 for priority
  • 1000111 for location
  • 1 for status
  • 000 for candy

In base-10 they’re respectively the values 33, 7171, 11, and 00. That checks out! (Don’t worry we’ll parse bitfields for real later on.)

Constructing a bitfield by adding powers of two

That worked, but we need a generalized programmable solution to putting our data into a bitfield. We need an algorithm! The result of our calculation should be the base-10 integer number 7288 in whatever base we find convenient to work with (probably base-10).

Our algorithm will need two sets of data:

  • The bit lengths of each field
  • The actual values to place in each field

Using the bit lengths of each field we’ll calculate the unit power of two for each field going from smallest to largest.

That is, we’ll figure out what powers of two each piece of the bitfield will be expressed in.

Candy, the first field, starts things off in units of 20=12^0 = 1

Next is the status field. Since the candy field took three bits status will be in units of 20+3=23=82^{0 + 3} = 2^3 = 8

Then the location field. Just one bit was taken for status so we have 20+3+1=24=162^{0 + 3 + 1} = 2^4 = 16

Finally the priority field coming in after the seven bits for location will be in units of 20482048

20+3+1+7=211=20482^{0 + 3 + 1 + 7} = 2^{11} = 2048

Given an ordered set of fields and their bit lengths we could arbitrarily map that data into the bit unit for each field. Great! That means we can construct that data in advance and it’s a simple matter of multiplication and addition to arrive at our bitfield integer.

The process would be:

  1. Precalculate the base-2 units for each field
  2. Receive incoming data to place into the bitfield
  3. Map each data value to its integer representation
  4. Multiply each of those values by its field’s base-2 unit
  5. Add all of those calculated values to arrive at the bitfield integer

SIDE NOTE that’s jumping ahead: We could also use bitwise OR instead adding the values. Since each value is in its own space on the bitfield the end result is the same. If you don’t know what I’m talking about then don’t worry we talk about bitwise operations when we get into how we parse a bitfield.

Here’s a code example of a generic bitfield generator that has no internal knowledge of our fields. That means we have to give it the number values we want to go into the bitfield.

class BitfieldGenerator
  attr_reader :fields

  def initialize
    @fields = []
    @_current_exponent = 0

  def add_field(name, bits)
    field_exponent = @_current_exponent
    @_current_exponent += bits
    @fields << {
      name: name,
      bits: bits,
      base: 2 ** field_exponent

  # operations kept distinct to aide readability
  def serialize(data_values)
    calculated = field_values(data_values)
    # => [0, 8, 1136, 6144]

    # => 7288

  def field_values(values) do |value, index|
      value * @fields[index][:base]
bf =
bf.add_field('candy', 3)
bf.add_field('status', 1)
bf.add_field('location', 7)
bf.add_field('priority', 2)

After adding the fields our @fields data represents all of our bitfield knowledge.


We can then generate a bitfield by passing a data values array ordered from smallest to largest field.

bf.serialize([0, 1, 71, 3]) # => 7288

That’s a number!

Constructing a bitfield using bitwise left shift

We have another option available to us when we’re constructing our bitfield: bitwise left shift. That operator allows us to push a value an arbitrary number of bits to the left. Essentially a shorthand for multiplying it by a power of 2. It can make for bitfield generation code that might be easier to follow since we can directly abstract each value’s position in the bitfield.

Bitwise left shift Base-2 Base-10
101\ll0 1 1
111\ll1 10 2
121\ll2 100 4
131\ll3 1000 8
141\ll4 10000 16
151\ll5 100000 32
161\ll6 1000000 64
171\ll7 10000000 128

That’s awesome! We can take the number we know we want and then directly say where we want it to be in the resulting bitfield number. Let’s build this bitfield up iteratively. Of course there are also many ways we could nicely wrap it up into a class. For example our earlier class keeping track of the base for each field could instead keep track of the position directly. It’s a bit more intuitive to think about the position of each value vs its starting power of two.

bitfield = 0

candy_value = 0
candy_bits = 3
candy_position = 0
bitfield += (candy_value << candy_position)

status_value = 1
status_bits = 1
status_position = candy_position + candy_bits
bitfield += (status_value << status_position)

location_value = 71
location_bits = 7
location_position = status_position + status_bits
bitfield += (room_value << room_position)

priority_value = 3
priority_bits = 2
priority_position = location_position + location_bits
bitfield += (priority_value << priority_position)

bitfield # => 7288

And, again, we could certainly use bitwise OR instead of adding and we’d get the same result because each value has a separate place in the number.

Let’s call it done for creating a bitfield. Hooray!

Bitfield creation: recap

  1. Have some data
  2. Put the data into a consistent order (e.g. our priority, location, status, candy)
  3. Limit the data to a set of predefined values (e.g. our candy can only be one of a predefined list of candies)
  4. Determine how many bits is required to hold each set of data

At that point you’re ready to make a bitfield!

  1. Convert each piece of data into its predefined value (e.g. peppermint patties converts to 0)
  2. Incorporate it into the bitfield using one of the transformation techniques (e.g. bitshifting each value to its correct position and adding up the results)
  3. Represent the resulting bitfield number in whatever base works well. Base-10 is probably easiest.

Great! Now that we have a bitfield. How do we get the data back out?

Parsing a bitfield

Now we have the opposite problem. We want to take a squished down opaque data representation and expand it back out into consumable data.

Or do we? Next up I’ll show one way of extracting each piece of data out of the bitfield. But then I show how you can use the bitfield as the data source directly. If you don’t need to represent the entire set of data at once then you could do less work by getting the fields directly from the bitfield as needed.

Extracting fields from a bitfield using bitwise right shift

This is the opposite of the bitwise left shift we used to assemble the bitfield. A bitwise right shift slides the binary number to the right: essentially dividing the number by powers of two.

(55).to_s(2)      # => "110111"

(55 >> 0).to_s(2) # => "110111"
(55 >> 1).to_s(2) # =>  "11011"
(55 >> 2).to_s(2) # =>   "1101"
(55 >> 3).to_s(2) # =>    "110"
(55 >> 4).to_s(2) #=>      "11"
(55 >> 5).to_s(2) #=>       "1"
(55 >> 6).to_s(2) #=>       "0"

# from there on out there's no more to shift
# any further right shifts result in 0

We can use this to extract the fields from the bitfield.

bitfield = 7288

candy_value = bitfield[0,3] # first three bits
bitfield = bitfield >> 3 # discard the first three bits

status_value = bitfield[0,1] # next bit is status
bitfield = bitfield >> 1 # shift away the status bit

# get the location then discard the bits
location_value = bitfield[0,7]
bitfield = bitfield >> 7

# now all that's left in the bitfield is the priority
# but we'll complete the pattern
priority_value = bitfield[0,2]
bitfield = bitfield >> 2

p [
] # => [0, 1, 71, 3]

There we go! Then it’s only a matter of mapping those integer fields back to their actual values.

That process essentially splits up the bitfield by field bit lengths:

priority location status candy
11 1000111 1 000

Parsing a bitfield using bitwise AND

Extracting all the fields from a bitfield is fine, but that’s not the only way to use a bitfield. We can also “query” it for values directly! The operator that makes that easy is bitwise AND. Let’s look at how it works.

Sidetrack into bitwise AND

Let’s say we have a perfectly interesting binary number.


Now let’s say that number is a bitfield like so:

Field 1 Field 2 Field 3
0001 10 11

Very cool. And now let’s say we want to check on Field 2. We don’t need to parse out Field 1 or Field 3. We just need the bits from field 2.

    ∨∨ these two bits right here
∧∧∧∧ not these four bits
      ∧∧ or these two bits

We can extract those bits out using bitwise AND.

Check it out!

"00011011".to_i(2) & "1100".to_i(2)
=> 8


Ok that’s maybe not as exciting as it seems. Let’s step through it. I’m going to ramble around this explanation in the hopes of rounding up everyone to the same page.

The first thing we need to do to use bitwise AND to get information from our bitfield is to represent the bits we want as a number. That is, we need to make a binary number that has 1 where we want information and 0 where we don’t want information.

00011011 - the bitfield
00001100 - the comparison with 1s where we want data

Ruby and most langauges want to work with base-10 numbers and want to show us base-10 numbers. So let’s look at this via base-10.

"00011011".to_i(2) # => 27
"00001100".to_i(2) # => 12

So our bitfield is 27 in base-10. And the position of the bits we’re interested in is 12.

The operation we perform is then 27 bitwise AND 12. In Ruby the bitwise AND operator is &

The result is represented in base-10.

27 & 12 # => 8
(27 & 12).to_s(2) # => "1000"

That means 8 is the base-10 number we’d get if we represented the extracted bits from our field in base-10.

Let’s look at that all in base-2 AND base-10 to make it more clear what’s happening.

| Operation    | Base-2   | Base-10 |
| BITFIELD     | 00011011 | 27      |
| BITWISE AND  | 00001100 | 12      |
| RESULT       | 00001000 |  8      |

Zooming in on the field we care about. We know in our bitfield that it’s 10. We hit that 10 with an extraction hammer of 11 to knock out whatever the bits are at that position in our bitfield which is 10. But Bitwise AND can’t simply return “10” because the value of the number is significiant. So we end up with a number according to the following rules.

Bitfield Comparison Result
1 1 1
0 1 0
1 0 0
0 0 0

We can think of the comparison number as a hammer of 1 bits that will knock out either the 1 or 0 in its corresponding position in the bitfield and then the result will have 0s everywhere else.

Let’s see this in action by viewing every permutation of our “Field 2”. I’ll leave the rest of the bitfield as it is in our example to make it clear those numbers don’t matter at all. The bitwise AND gives us only the bits we care about.

  00011111 bitfield
& 00001100 comparision
  00001100 result (12 in base-10)

  00011011 bitfield
& 00001100 comparision
  00001000 result (8 in base-10)

  00010111 bitfield
& 00001100 comparision
  00000100 result (4 in base-10)

  00010011 bitfield
& 00001100 comparision
  00000000 result (0 in base-10)

What does this mean? It means that since we, as the bitfield designers, know that Field 2 has two bits and four possible settings we can inspect them using bitwise AND. A result of 12 will mean that we have 11 in the field, 8 means we have 10, 4 means we have 01, and 0 means we have 00.

The position of the field relative to its neighbors IS significant. To demonstrate why let’s consider Field 3.

  00011011 bitfield
& 00000011 comparision
  00000011 result (3 in base-10)

  00011010 bitfield
& 00000011 comparision
  00000010 result (2 in base-10)

  00011001 bitfield
& 00000011 comparision
  00000001 result (1 in base-10)

  00011000 bitfield
& 00000011 comparision
  00000000 result (0 in base-10)

All different numbers except for the 0 case because of the position of the field within the larger bitfield.

The key is then that we have to be aware of the possible results per bitfield by both bit length and bit position.

Let’s see how this works in practice by going back to our status messages bitfield.

Back to our candy messages example

To extract individual fields from our bitfield we need four different comparison numbers. Determining each number is simple: fill in a 1 for the bits of the field and leave everything else 0.

value purpose
1110001111000 the bitfield in question
1100000000000 bits to extract the priority
0011111110000 bits to extract the location
0000000001000 bits to extract the status
0000000000111 bit to extract candy

Let’s look at status and see how its bitwise AND comparison value can extract the actual value of the field while ignoring anything else in the bitfield.

not_empty         = 0b1010101011101
empty             = 0b1010101010101
status_comparison = 0b0000000001000

empty & status_comparison
# => 8
(empty & status_comparison).to_s(2)
# => "1000"

not_empty & status_comparison
# => 0
(not_empty & status_comparison).to_s(2)
# => "0"

It might not be super obvious there, but those are good results! The 1000 means that there was a 1 in the status field position so it’s included and every other digit in the bitfield is returned as a zero. Any leading zeros aren’t represented just like 000100 is simply 100. The 0 result means the status field was a zero so there’s nothing left in the result other than zero so it’s all represented as zero.

0b0000000001000 == 0b1000
# => true

0b0000000000000 == 0b0
# => true

That’s success! We can precisely extract only the bits that we need to know about using bitwise AND!

To interpret the result we have two options:

  • Understand the bitwise AND results directly
  • Bitwise right shift the results to line up with the original values

Looking at that status result for bitwise AND. We could either have our interpreting code understand that the value 8 means the status is “empty” or we can right shift the result the appropriate number of places to extract the value directly.

(empty & status_comparison) >> 3
# => 1

(not_empty & status_comparison) >> 3
# => 0

Let’s see the four possible settings of priority as a second example. This time without converting to base-2.

urgent   = 0b1110101010101 # => 7509
high     = 0b1010101010101 # => 5461
medium   = 0b0110101010101 # => 3143
low      = 0b0010101010101 # => 1365
priority = 0b1100000000000 # => 6144

# use these values directly
urgent & priority # => 6144
high   & priority # => 4096
medium & priority # => 2048
low    & priority # => 0

# or right shift out of the bitfield
(urgent & priority) >> 11 # => 3
(high   & priority) >> 11 # => 2
(medium & priority) >> 11 # => 1
(low    & priority) >> 11 # => 0

Bitfield interpretation: recap

  1. Have a bitfield
  2. Know the bitfields data fields and their bit positions
  3. Know the predefined values for each field and how they map to the possible bit values per field
  4. Use bitwise AND to extract specific fields from the bitfield and then either work with those results directly or bitwise RIGHT SHIFT the results back to the original values based on the fields position in the bitfield

Yay bitfields!

Bitfields are a somewhat obscure data structure: but they’re out there! Famously the Reddit r/place April Fool’s event was created by using a Redis bitfield.

We used a bitfield of 1 million 4 bit integers. Each 4 bit integer was able to encode a 4 bit color, and the x,y coordinates were determined by the offset (offset = x + 1000y) within the bitfield. We could read the entire board state by reading the entire bitfield. We were able to update individual tiles by updating the value of the bitfield at a specific offset (no need for locking or read/modify/write).

— How we built r/place

If you ever come across an integer that seems to be weirdly treated as data: there’s a good chance that’s a bitfield! If you ever need to squish a predefined and rigorously scoped set of data into the smallest possible space: that could be a bitfield!

Further reading