WebSockets are a mess, there are better options

About 5 minutes to read. Posted on April 8, 2021 08:00

Since the first day I saw an AJAX request update a site in "real time" I was fascinated with the idea of live websites. They turned the web from a document storage into a user interface that can interact with almost anything on the globe. There is just one problem: AJAX and it's cousins (like fetch) are all triggered by the client. That makes them great for things like saving data, getting the next page for a list, and adding an upvote. What they can't do (well) is notify about changes that happened in the data source by a third party.

That's where websockets (WS) come in. If you don't know, websockets are basically TCP via HTTP. They work by sending TCP data over the HTTP channel, in order to bypass firewalls and the like. Just like a TCP connection you can use them to send any kind of data both ways. That means your server can now send you updates when a third party changes something — rejoice!

Sadly there are some downsides. Here is the summary:

Key insights

WebSocket are amazing when using the same library on the server and the client

My first experience with websockets was socket.io. Like the title suggests using this library was magical. With just a few lines of code I was able to create a chat application, it was really that easy! The pseudocode for an echo server looked something like this on the server:

const io = require('socket.io')

io.on('message', (msg) => {
  io.emit(msg)
});

And on the client:

socket.on('message', (msg) => {
  console.log(msg)
})

Just like that you've got an EventBus between your client and the server. You can even send complex objects through JSON (but of course no JS Objects or the like). If this is all you need, and you control both ends of the connection, the implementation is very easy.

After people see this, I've even heard some of them claim that websockets should simply replace REST or GraphQL APIs, because those are only unidirectional. What these people (including me originally) often don't realise is that socket.io is, like many other libraries like this, very opinionated to make things as easy as possible.

There are different WebSocket implementations that are incompatible with each other

As mentioned in the introduction, the WebSocket protocol is basically TCP. That means you can send JSON objects over the wire, but you can also send anything else binary. For example audio or video streams, or binary frames for your multiplayer game. If that's what you need you can stop reading here. websockets is the way to go.

But if you're like me and all you really need is JSON, then you might be surprised to realise there are different standards for sending JSON to the client. The reason is simple: If I can send anything, how can the client know what to do with the data? In the case of socket.io they make a number of assumptions:

A socket.io event package might look something like this:

// Server side code:
io.emit('register-user', user, groups);

// JSON payload
[
    "register-user",
    {
        "name": "Test user"
    },
    [
        {
            "name": "Group 1"
        },
        {
            "name": "Group 2"
        }
    ]
]

It is possible to implement custom parsers that extend or change the format and make it possible to send any kind of data via socket.io, but if that's what you're doing you already know most of what I've written in this article.

All these assumptions are what make socket.io work like magic with their frontend client. But they also mean that you can't simply hook up a different WebSocket server and assume it will just work. A different WS framework might not follow these same assumptions.

For this reason socket.io offers a range of different server implementations:

All this to say: If your favorite WS library doesn't offer a Server side implementation in your chosen language, it's going to be a headache. If you're using the socket.io client for example, you'll need to implement features and assumptions server side, before it will work and vice-versa.

That makes WebSockets a bad replacement for HTTP APIs

The whole previous section leads up to the point here. The behavior of each WebSocket implementation is a little different. After browsing a few implementations to research this article I also get the impression that it's a topic that's difficult to understand without knowing what's happening on the wire. I feel that after reading the RFC and the code repositories for websockets/ws and socketio/socket.io I have a good understanding of what's going on behind the scenes.

The difference between your typical REST or GraphQL APIs and WebSockets is that you already know how HTTP works. Learning how to use a plain WebSocket endpoint is learning an additional protocol (ws://). Also because we're not in the context of HTTP we need custom solutions for things like Authentication as there are no cookies or headers in the WS world. In order to re-use Authentication from a HTTP session we need to translate those things to a seperate WS session.

If it's not essential for your application to have a bidirectional and binary socket, I humbly suggest that you try a HTTP based real time solution.

What most websites need is Server Side Events

Do you remember how long polling works? It's a method to use AJAX for server updates. Basicly the server waits to respond until there is a new event and then closes the connection. There are a bunch of downsides to that technique that are improved upon with Server Side Events (SSE). SSE also keep a HTTP request open, but don't close the connection once an event is received. Instead multiple events can be delivered via the same connection.

SSE are also unidirectional. Meaning that it's not possible for the browser to send data back over the connection. But adding both AJAX and SSE together make for a bidirectional connection between browser and server. What makes this combination great is that modern browsers have easy built-in functions to interact with both concepts.

The standard for AJAX requests (in vanilla JS) is currently fetch:

function getHomepage() {
    return fetch('https://ma.ttias.ch');
}

For SSE it's EventSource:

const es = new EventSource(url);
es.addEventListener('ping', (ev) => {
    console.log(ev);
})

On the server side the implementation is also simple. Here is a example server in NodeJS:

var http = require('http');

http.createServer(function (req, res) {
    res.setHeader('Access-Control-Allow-Origin', '*');
    res.setHeader('Content-Type', 'text/event-stream');
    let pingNr = 0;
    setInterval(() => {
        res.write("event: ping\ndata: " + pingNr + "\n\n");
        pingNr++;
    }, 1000)
}).listen(8080);

Because it's still HTTP you can access all your cookies and other headers you might need in the EventSource stream.

Of course this is all this easy because it's standing on the shoulders of giants. The fact there is a builtin JS handler for this and there is a predefined format (using event and data in the response) makes it easy to use. But it also means everyone is using the same standards.

Conclusion

WebSockets have an important role in web development. But because they mostly function outside of the HTTP context, they can be very complicated to work with. Especially if you're not in control of the client and the server.

Server Side Events offer an easy alternative for non-binary data and work great as an addition to applications with an existing HTTP API. As an added bonus they're super simple to implement on the server and the client.

Thank you for reading

I hope you enjoyed the article and maybe even learned something. If you would like to stay in contact I have a mailing list or you can reach out to me via social media.