Reading the Bandit source code: Part 1
- Author: Stephen Ball
- Published:
- Permalink: /blog/reading-the-bandit-source-code
Bandit is a new HTTP library that isn't only written in Elixir, but written with readable source code. Let's see what we can learn!
From the Bandit README
Bandit has been built from the ground up for use with Plug applications; this focus pays dividends in both performance and also in the approachability of the code base.
Bandit exists to demystify the lower layers of infrastructure code. In a world where The New Thing is nearly always adding abstraction on top of abstraction, it’s important to have foundational work that is approachable & understandable by users above it in the stack.
Prioritize (in order): correctness, clarity, performance. Seek to remove the mystery of infrastructure code by being approachable and easy to understand
Sounds great! Let’s see what we can learn from a blind reading of the code.
This is a blog post, not a video, so you aren’t going to get a live reaction. But I will do my best to type out my thoughts as we explore the code. Let’s go!
First off, we need the code.
I store all of my code from github in a $HOME/github
directory with subdirectories per user/org.
e.g.
$HOME/github/livebook-dev/livebook
$HOME/github/sdball/ex_duck
# etc.
So let’s start there.
$ cd $HOME/github
$ mkdir mtrudel && cd mtrudel
$ git clone git@github.com:mtrudel/bandit.git
$ cd bandit
Now let’s explore!
$ ls -lah
Permissions Size User Date Modified Name
.rw-r--r-- 117 sdball 28 Dec 22:02 .ackrc
drwxr-xr-x - sdball 28 Dec 22:07 .elixir_ls
.rw-r--r-- 97 sdball 28 Dec 22:02 .formatter.exs
drwxr-xr-x - sdball 29 Dec 10:18 .git
drwxr-xr-x - sdball 28 Dec 22:02 .github
.rw-r--r-- 615 sdball 28 Dec 22:02 .gitignore
drwxr-xr-x - sdball 28 Dec 22:07 _build
.rw-r--r-- 5.2k sdball 28 Dec 22:02 CODE_OF_CONDUCT.md
drwxr-xr-x - sdball 28 Dec 22:06 deps
drwxr-xr-x - sdball 28 Dec 22:02 lib
.rw-r--r-- 1.1k sdball 28 Dec 22:02 LICENSE
.rw-r--r-- 1.5k sdball 28 Dec 22:02 mix.exs
.rw-r--r-- 7.0k sdball 28 Dec 22:02 mix.lock
.rw-r--r-- 8.0k sdball 28 Dec 22:02 README.md
.rw-r--r-- 336 sdball 28 Dec 22:02 SECURITY.md
drwxr-xr-x - sdball 28 Dec 22:02 test
All the usual friends for an Elixir application.
Let’s see what lib
tells us.
lib
├── bandit
│ ├── application.ex
│ ├── clock.ex
│ ├── delegating_handler.ex
│ ├── exceptions.ex
│ ├── headers.ex
│ ├── http1
│ │ ├── adapter.ex
│ │ └── handler.ex
│ ├── http2
│ │ ├── adapter.ex
│ │ ├── connection.ex
│ │ ├── errors.ex
│ │ ├── flow_control.ex
│ │ ├── frame
│ │ │ ├── continuation.ex
│ │ │ ├── data.ex
│ │ │ ├── goaway.ex
│ │ │ ├── headers.ex
│ │ │ ├── ping.ex
│ │ │ ├── priority.ex
│ │ │ ├── push_promise.ex
│ │ │ ├── rst_stream.ex
│ │ │ ├── settings.ex
│ │ │ ├── unknown.ex
│ │ │ └── window_update.ex
│ │ ├── frame.ex
│ │ ├── handler.ex
│ │ ├── README.md
│ │ ├── settings.ex
│ │ ├── stream.ex
│ │ ├── stream_collection.ex
│ │ └── stream_task.ex
│ ├── initial_handler.ex
│ ├── phoenix_adapter.ex
│ ├── pipeline.ex
│ └── websocket
│ ├── connection.ex
│ ├── frame
│ │ ├── binary.ex
│ │ ├── connection_close.ex
│ │ ├── continuation.ex
│ │ ├── ping.ex
│ │ ├── pong.ex
│ │ └── text.ex
│ ├── frame.ex
│ ├── handler.ex
│ ├── handshake.ex
│ ├── permessage_deflate.ex
│ └── socket.ex
└── bandit.ex
A few things look interesting right away.
-
There’s an application here.
- That means Bandit runs some processes. That’s not unexpected since something has to be running and ready to answer requests. What is that something and what are its peers? I have no idea!
-
Bandit has an internal concept of a Clock
- That’s a good sign because applications where time is an important part of the domain should have their own clock. If only to name and abstract lower level calls to fit the domain.
-
http1 looks like it’s a much simpler spec than http2
- http1 only has two files instead of http2’s many files
-
Both http protocols have an an adapter and handler
- That’s also a good sign indicating that the project has landed on a consistent design abstraction.
-
Bandit explicitly knows about Phoenix enough to have a named adapter
- That’s a little surprising. Would Bandit need to know about all web frameworks that want to use it? I assume this is to support an easy Phoenix configuration. Maybe the alternative was Phoenix having to implicitly know about Bandit which also doesn’t sound great. I think most ideal would be any http server that wants to work with Phoenix (or any other plug based web framework) being expected to implement a common interface.
-
There’s an initial handler at the top level, presumably to choose a specific protocol
-
There’s a pipeline concept. I assume it’s a pipeline of Plugs but maybe it’s some kind of protocol detail.
-
websocket exists as a parallel protocol to http1/http2
- I know that websocket connections are http connections that are upgraded to a websocket. I wonder if both http1 and http2 protocols support that upgrade.
Ok we have some idea of what we have in this repo.
Now since Bandit is implementing defined protocols we can probably find some RFC references.
# count the files that contain "RFC"
$ rg -l RFC | wc -l
36
Yes, 36 files have “RFC”
Let’s see what they are!
# count the lines that contain "RFC"
$ rg "RFC" | wc -l
155
155 lines, cool cool.
Scrolling through the results of rg "RFC"
it looks like some RFC mentions have a space like RFC 7692
and some do not like RFC6455§7.1.2
Let’s get a count of the specific RFCs and see what matters to Bandit.
First we’ll extract all the RFC references
$ rg -o "RFC[[:space:]]?\d+§?[.0-9]*" --no-filename --no-line-number | head
RFC7540§6.5.2
RFC7540§6
RFC7540§8.1.2.3
RFC7540§8.1.2.3
RFC3986
RFC7540§8.1.2.2
RFC7540§8.1.2.1
RFC7540§8.1.2.3
RFC7540§8.1.2
RFC7540§8.1.2.2
Then we’ll count them upgrade
$ rg -o "RFC[[:space:]]?\d+§?[.0-9]*" --no-filename --no-line-number | sort | uniq -
c | sort -n
1 RFC 7540
1 RFC 7540.
1 RFC 7692
1 RFC2616§13.5.1
1 RFC2616§4.
1 RFC3986
1 RFC6455
1 RFC6455§4.2
1 RFC6455§4.2.1
1 RFC6455§4.2.2
1 RFC6455§5.5.1
1 RFC6455§7.4.1
1 RFC7230§4.1
1 RFC7540§11
1 RFC7540§3.5
1 RFC7540§5.1.
1 RFC7540§5.1.1
1 RFC7540§6
1 RFC7540§6.5.2
1 RFC7540§6.9.
1 RFC7540§6.9.1
1 RFC7540§8.1
1 RFC7540§8.1.2
1 RFC7540§8.1.2.2.
1 RFC7540§8.1.2.6
1 RFC7692
1 RFC7692§7
1 RFC9110§5.6.7.
1 RFC9110§8.6
1 RFC9112§3.2
1 RFC9112§3.2.1
1 RFC9112§3.2.3
1 RFC9112§6.3.3
2 RFC6455§5.2
2 RFC7540§4.2
2 RFC7540§5.3.1
2 RFC7540§8.1.2.1
2 RFC7540§8.2
2 RFC9112§3.2.4
2 RFC9112§6.3.5
3 RFC6455§8.1
3 RFC7540§8.1.2.2
3 RFC7540§8.1.2.5
3 RFC9112§3.2.2
4 RFC6455§5.5.3
4 RFC7540§6.1
4 RFC7540§6.10
4 RFC7540§6.3
4 RFC7540§6.4
4 RFC7540§6.7
4 RFC7540§6.8
5 RFC6455§5.5
5 RFC6455§5.5.2
5 RFC7540§6.2
5 RFC7540§8.1.2.3
6 RFC7540§6.9
8 RFC7692§6.1
9 RFC6455§5.4
10 RFC6455§7.1.2
16 RFC7540§6.5
Gosh that’s a lot of sections. Let’s skip those and count the RFCs themselves. Let’s also eliminate any space between “RFC” and the RFC number so we count consistently.
$ rg -o "RFC[[:space:]]?\d+" --no-filename --no-line-number | sed -e 's/[[:space:]]//' | sort | uniq -c | sort -n
1 RFC3986
1 RFC7230
2 RFC2616
2 RFC9110
11 RFC7692
11 RFC9112
44 RFC6455
84 RFC7540
There. Clearly RFC7540 is super important to Bandit.
But maybe tests are skewing the results if there’s something like a bunch of tests mentioning specific RFC details. Let’s see what’s called out in lib
$ rg -o "RFC[[:space:]]?\d+" --no-filename --no-line-number lib | sed -e 's/[[:space:]]//' | sort | uniq -c | sort -n
1 RFC3986
1 RFC7230
2 RFC2616
2 RFC9110
4 RFC9112
6 RFC7692
17 RFC6455
55 RFC7540
Ok a similar ratio of results and this nicely demonstrates that the tests do, in fact, directly mention RFCs a fair number of times. That’s a good sign for the readability of the tests.
Let’s take a look at the repo’s Git history and see what we’re dealing with as far as authors, commits, and style.
Note: git r
calls a pretty git log script.
Looks like Bandit is either a stable, approachable project or a lot of work is being overly squashed into commits.
Let’s see what’s been most affected by, say, the last three months of commits using a custom git command: git churn
$ git churn --since="3 months ago" | tail
6 test/bandit/http2/plug_test.exs
6 test/bandit/http2/protocol_test.exs
6 test/bandit/websocket/http1_handshake_test.exs
6 test/support/simple_websocket_client.ex
10 lib/bandit.ex
10 mix.lock
14 test/bandit/http1/request_test.exs
15 mix.exs
16 README.md
16 lib/bandit/http1/adapter.ex
That count there is the number of commits affecting those files. We can see that most of the work is going on in the http1 adapter. But it also looks like the README is being kept updated. Awesome. And that tests are getting as much attention as code. Great!