Lock your rebar dependencies or die

When developing with erlang and using libraries outside of OTP you’ve most probably come across rebar. Rebar is a build tool that aids in handling dependencies, doing releases, eunit, etc. In short, rebar is a pretty awsome tool.

What’s not awsome is when rebar deps, either your own or your dependencies deps break their API which in turn render your app broken. Luckily there’s a simple fix for this. Let me explain.

This is a pretty common sight in a typical rebar.config

{deps, [{oauth2, “.*”, {git, "git://github.com/kivra/oauth2.git", “master”}}]}.

Well, maybe not using oauth2 as a dep, but specifying “master” or “HEAD” as the branch to track. This means that every time someone commits anything to oauth2 that breaks the API it will render your app broken.

When using rebar for dependency handling it accepts three types of forms:

{deps, [app]}.
{deps, [app, "1.0.*"]}.
{deps, [app, {VCS, “URI”, REV}]} 

where VCS is one of: git, hg, bzr, svn, and rsync. URI tells VCS where to find the dep, i.e. "git://github.com/kivra/oauth2.git" and REV is the dependecies revision. When using git you have a plethora of ways to specify revisions (This only applies to git). Here I’ll list the possible ways:

{git, ”git://github.com/kivra/oauth2.git”}
{git, ”git://github.com/kivra/oauth2.git”, “”}
{git, ”git://github.com/kivra/oauth2.git”, “HEAD”}

These will all get translated to “git checkout -q origin/HEAD" the equivalent of doing:

{git, ”git://github.com/kivra/oauth2.git”, {branch, “HEAD”}} 

The last two methods are:

{git, ”git://github.com/kivra/oauth2.git”, {tag, “TAG”}}
{git, ”git://github.com/kivra/oauth2.git”, {branch, “BRANCH”}}

Which get translated to “git checkout -q TAG" and git checkout -q origin/BRANCH" respectively.

This means that it’s pretty simple to lock the versions your app depends on. A good strategy is to go on a descriptive “tag” if it exists, such as “v1.0”. What we end up doing a lot when using a lib that isn’t tagged is to use the git hash such as:

{git, ”git://github.com/kivra/oauth2.git”, {tag, “4cf6d7e686”}}

The git hash can be a prefix, git supports shorter hashes if they are found to be unique. I.e. in the above case we could’ve just written {tag, “4cf6”}.

But even if you’re a good citizen and correctly lock your deps you might still get hit by your deps deps not being locked, or your deps deps deps. The rabbit hole goes deep.

A comparison of web servers for running Chicago Boss REDUX

Some time ago we did a number of benchmarks measuring the current state of Chicago Boss and it’s web servers. Since then we have made some small changes to improve performance and reduce potential bottlenecks. To begin with, Chicago Boss is a very nicely designed system with clear separation of concerns, interface, etc.. Chicago Boss adheres strictly to OTP which is both good and bad. One of the first things we noticed after running fprof and eprof on a system under load was that to much time was spent waiting for messages sent and queued between gen_server:call/2. Then we saw that many gen_server instances didn’t updated its state after it started, but still needed to keep some information about the adapter, connection, etc.. so we decided to convert these to regular modules and use a common ets-table with {read_concurrency, true}. Through these changes, we got down the wait time in gen_server modules in the system under high load. On an average we see a ~40% better throughput with no errors (using the session cache backend).

We also made changes to how Misultin gets initiated with the result that we got rid of the many error 503 we saw in the previous tests. More specifically, we increased the number of max_connections from 4096 to 8192.

A lot of time is spent during the creation and sessions gets for each request, reducing the number of session gets saw drastic improvements in performance. Using the session mock backend the number of sessions will grow and ultimately hamper session lookup times as is seen from the graphs below. Using the session cache backend (which uses memcached) we saw a bit higher latency due to memcached but over time we saw better overal performance due to the O(1) nature of memcached.

Using the cache backend for sessions had another positive side effect in that there was no errors generated. So the session cache backend can be a good idea even for a single server setup as it provides a very predictable experience.

All the changes have been pushed to Github.

Below are the new tests (# 2) with the old test as reference:

A comparison of web servers for running Chicago Boss

Chicago Boss is a web development framework much like Ruby on Rails but written in Erlang. At Kivra we use Erlang extensibly and continuously evaluate and test new technology. We contribute to a lot of Open Source projects and our main technology interest is massive data at massive scale.

Yaws is currently not supported with Chicago Boss but since we like Yaws and would like to see how well it works with Chicago Boss and in comparison with the other web servers in this test, we’ve done some patching to Chicago Boss to support this. Apparently there’s more to be done for this integration to be perfect as is evidenced by the graphs below.

This comparison is of purely academic purpose too better understand where in the chain certain things break and where we should focus our efforts in achieving even better performance, up-time and scalability. When we choose a web server for our projects we tend to look at a number of factors where speed is but one of them.

Also, bear in mind that the numbers in this test is for the web server being run through simple_bridge and fed into Chicago Boss. Depending on the quality of the simple_bridge implementation for that web server it might yield significant different results, which is evidenced by the raw misultin vs. Chicago Boss+misultin graph below.

Tech:
To be able to run a http load test of somewhat massive scale there’s a bunch of things to tweak to achieve maximum performance. For this test we’ve decided on the bare minimum so standard kernel, scheduling, etc. just changes to open file descriptors and minor changes to the ip stack.

Each connection we make from the client machine requires a file descriptor, and by default this is limited to 1024. To avoid the Too many open files problem you’ll need to modify the ulimit for your shell. We also ended up tweaking the client machine “ipv4/ip_local_port_range” to be able to open as many ports as needed. This can be changed in the current shell on the client machine:

# echo "32768 65535" >/proc/sys/net/ipv4/ip_local_port_range
# cat /proc/sys/net/ipv4/ip_local_port_range
32768 65535
# ulimit -n 65535
# ulimit -n
65535

We’re running httperf which is a tried and proved http load testing tool and we tweak some things before compilation to handle the load.

The server is a 16GB 8CPU machine running Erlang R15B with kernel-poll enabled:

Erlang R15B (erts-5.9) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:true]

We also tweak some parameters on the server machine:

# ulimit -n 65535
# ulimit -n
65535
# sysctl -p
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 1024 65535
net.core.somaxconn = 65535

We’ve used Roberto Ostinelli’s excellent “A comparison between Misultin, Mochiweb, Cowboy, NodeJS and Tornadoweb" as a model to be able to correlate somewhat. There is of course allot of difference between raw web servers like the one in Ostinelli’s test and running ChicagoBoss+ErlyDTL+Simple_Bridge+<web server>. So don’t try to draw too many conclusions if comparing since there’s also other factors that come to play.

We’ve used a simple controller which generates a list of random length(1-20) and passes it to a view for rendering. This way we get to test a normal MVC type pattern with ErlyDTL parsing.

Controller:

-module(loadtest_test_controller, [Req]).
-compile(export_all).
 
index('GET', []) ->
    {A,B,C}=erlang:now(),
    random:seed(A,B,C),
    R = random:uniform(20),
    List = getList(R),
    {ok, [{list, List}]}.

getList(R) ->
    lists:foldl(fun(N,A) ->
            [[{val, N}]|A] end,
            [], lists:seq(1,R)).

View:

<html>
<head>
<title>{% block title %}Loadtest{% endblock %}</title>
</head>
<body>
{% block body %}
{% if list %}
  {% for l in list %}
    -{{ l.val }}<br/>
  {% endfor %}
{% else %}
  Nope
{% endif %}
{% endblock %}
</body>
</html>

For the standalone misultin test we used this:

-module(load).
-export([start/1, stop/0]).

% start misultin http server
start(Port) ->
	misultin:start_link([{port, Port}, {loop, fun(Req) -> handle_http(Req) end}]).

% stop misultin
stop() ->
	misultin:stop().

% callback on request received
handle_http(Req) ->	
    {A,B,C}=erlang:now(),
    random:seed(A,B,C),
    R = random:uniform(20),
    List = getList(R),

	Req:ok(io_lib:format("<html>~n<head>~n<title>Loadtest</title>~n</head>~n<body>~s</body>~n</htm
l>", [printList(List, [])])).

getList(R) ->
    lists:foldl(fun(N,A) ->
            [N|A] end,
            [], lists:seq(1,R)).

printList([], A) ->
	A;
printList([H|T], A) ->
	V = lists:flatten(io_lib:format("~p", [H])),
	NewAcc = V++A,
	printList(T, NewAcc).

Here’s the actual results from the tests:

The ideal graph would be linear from 1000 response/sec (start) to 12000 response/sec (end). Which is basically what we get when running misultin standalone.

Here we see that all web servers top out around 2000 res/sec. There’s clearly something strange with Yaws since it dies and start generating error which is seen in the graphs below. Probably something in the simple_bridge API bridging Chicago Boss <> Yaws since we know from experience that Yaws standalone is a rock solid web server.

This is an aggregated view of all the errors that occurred. In the case of misultin we started getting “connection reset" errors after a while. Yaws gave us "connection timeout" which means it doesn’t respond within the given five seconds. Mochiweb is pushing along nicely with two spikes that account for any errors given.

This is the average response time in ms for a request. The Y scale is O(log n) which means that there’s a certain overhead (such as setting up sessions, etc) when serving a request but that overhead doesn’t grow. This is good, you want a O(log n) or O(1) here.

This graph is basically the reverse of the error graph showing at how much req/sec each web server starts to crumble under the load and start generating errors.

It’s been great fun performing these tests and now comes the really fun part of using this insights and try to create something that performs even better. Misultin standalone is really the school book example of how a web server should perform. Since we don’t get those numbers when running with Chicago Boss there’s clearly a great deal of potential here.

We used misultin standalone as a reference and could just as easily have used Yaws or Mochiweb and they would probably have performed equally well. But this test wasn’t about which web server performs best but how good they perform under Chicago Boss which as we see from the tests are a whole other thing.

Hope you enjoyed this!