Terminal sessions you can bookmark: Building Zellij's web client

Zellij is a terminal workspace and multiplexer. One of the unique traits of terminal multiplexers is their ability to keep sessions alive in the background without a terminal attached to them. In the recent Zellij version we released a built-in web client, allowing users to attach to these sessions from the browser - essentially making a dedicated terminal application optional.

In this post we’re going to take a look at how we built the Zellij Web Terminal: which technologies we used, how we architected the solution and some challenges we faced along the way.

What we have

Keeping sessions alive in the background involves a client/server architecture. The client runs in the user’s terminal like any other program and communicates over IPC with a server holding the state of the terminal session (open programs, pane and tab layout, etc.)

When Zellij first starts, the client spawns a new server process and daemonizes it so that it keeps running independently in the background. The client passes user input (keystrokes and mouse events) to the server, and the server passes render instructions to the client. When the client detaches from the server, the server remains alive until the client connects again.

The Goal

To allow the browser to act as a Zellij client, we would need:

A terminal emulator able to run inside the browser
A web-server able to both serve this emulator and interact with the Zellij server

Architecting the solution

architecture diagram - showing the many-to-many relationship between the Zellij web server and terminal sessions

We elected to have a single web-server per machine, able to serve multiple sessions to multiple clients. Reusing the Zellij client code per connection, serving as a translation layer between the browser’s websockets and the Zellij server’s IPC channels. This would mean that there’s only one web-server per machine - making administration easier while still allowing multiple-session operations per browser client.

Since browser clients - through the web-server - will be connected to the same IPC channels as terminal clients, they appear as regular users inside a terminal session: blurring the distinction between the two user interfaces.

Building the web server

web-server diagram - demonstrating the relationship between the browser and the Zellij web server

URL scheme

Zellij sessions are namespaced: their names being unique per the machine they are running on. When integrating with web technologies, we get an opportunity: we can tie this uniqueness to our URL scheme. Meaning that a session called “backend-code” will always be accessible through https://127.0.0.1/backend-code. Here’s how it works:

If we enter a session’s URL in our browser and the session is running, we attach to it - viewing and interacting with its running processes.
If the session is not currently running but has existed in the past (eg. before a reboot) we resurrect it: Zellij will re-create this session for us from its serialized metadata so that we can keep working where we left off.
If this is a completely new session, Zellij creates a session with this name and drops us into it.

This has the effect of essentially giving our terminal a URL bar: we can bookmark entire sessions (eg. prod, frontend, backend), have a one-click peak into a running logfile or even be more creative and bookmark a session that atomically opens our $EDITOR pointed to a specific file (maybe shopping-list.md or TODO).

Connection and bi-directional communication

The initial handshake between web-client and server is performed through http(s). In this handshake, the client authenticates itself with a special token (generated manually by the user through Zellij - more on the security measures below). This handshake also serves to establish which session the client will initially connect to. Using the URL path as explained above.

Once the client is authenticated and the connection established, two websocket channels are established: a terminal channel and a control subchannel.

The terminal channel is used by the server to send STDOUT bytes to the client (essentially ANSI instructions representing renders) and by the client to send STDIN bytes to the server (keypresses and mouse events). The control subchannel is used by the client side to send window resizes to the server and by the server side to send configuration changes, log messages and switch-session instructions to the client.

The reason these two are separated is two-fold: first so that they don’t end up blocking each other and second as a performance consideration to allow the potentially-heavy-throughput STDOUT not to be serialized/deserialized to distinguish it from control messages. We initially thought this would also allow us to apply backpressure from the browser all the way to the terminal application in order to batch up messages until they are fully processed, but this ended up not being necessary (as this is done well enough on the Zellij application side before it reaches the web server).

Security model

Since terminals are a sensitive interface to any machine, we were conscious about including built-in security measures with the solution. To log in, users need to generate a login-token from within a Zellij session (either from the command-line or from the Zellij UI).

Special care has been taken for this token never to be saved in its clear form in any storage. On the server-side, it is hashed and kept in a local SQLite database where it cannot be retrieved, only revoked. On the client-side, it is initially sent through the POST parameters of the handshake and then exchanged for a temporary session-token. This session-token is saved as an “http only” cookie so that the client-side code cannot access it. On the server side it is read from the http-headers.

The Zellij web-server enforces the use of HTTPS with a user-supplied certificate when listening on external interfaces. HTTP is only allowed when listening on localhost. We believe this is a reasonable compromise between security and user-friendliness. Allowing users to try the web capabilities locally first without forcing them to create an HTTPS certificate, but requiring one for terminal traffic passing over the wire.

Authenticated users are considered trusted, since by definition they have the same permissions and access as the web-server itself.

Server side technologies

Since Zellij is written in Rust, we naturally decided to use Rust to develop the web-server as well.

We chose axum as our webserver because we liked its mix-and-match approach - allowing us to plug in custom technology implementations and middleware (eg. the tower ecosystem for cookie and CORS handling), or use its built-in options when possible. We liked that it integrates seamlessly with tokio (our async runtime) and that it provides a “native” websocket implementation (using tokio-tungstenite under the hood). We also liked its declarative rather than macro-like routing system, even though we believe this last is mostly an aesthetic choice.

With axum, our web routes look like this (simplified):

let app = Router::new()
    .route("/ws/control", any(ws_handler_control))
    .route("/ws/terminal", any(ws_handler_terminal))
    .route_layer(middleware::from_fn(auth_middleware))
    .route("/", get(serve_html))
    .route("/{session}", get(serve_html))
    .route("/assets/{*path}", get(get_static_asset))
    .with_state(state);

Here we are able to protect the websocket routes (them being the ones who actually end up creating, attaching and communicating with the Zellij sessions) with an authentication middleware, while serving static assets in routes not protected by authentication.

We chose rustls for serving the client with HTTPS rather than relying on openssl. Elsewhere in the app we made the opposite choice and found that this creates a great deal of complexity when packaging Zellij for third-parties (eg. linux distributions and crates.io). We hope to use this decision as a crutch to slowly migrate the rest of the app to do the same, rather than doubling-down on a decision that ended up causing us trouble. The trade-off is mostly binary-size, but in this day and age we believe this is not an issue in the vast majority of environments.

For static assets, we use rust’s include_dir! macro, packaging them as part of the executable. This is a must for us as a distributable application rather than a managed web-server.

Handling daemonization

Given our architecture, there are a few occasions in which we need to daemonize the web-server - it being implemented as a separate process that serves all sessions on the machine. We do this with the daemonize crate, which gives a split interface: one branch for the child (the web server instance in our case) and one branch for the parent (the process invoking the web-server - for example the Zellij command line or another Zellij process). One challenge in this approach is error reporting. If the web-server failed to start for one reason or another, we need to be able to let the invoking process know so that we can display a proper error to the user (eg. Permission denied, Address in use, etc.)

To do this, we use the privileged_action method of the daemonize crate, allowing us to run code in the child process after the fork but before it detaches itself from the parent process’s environment. In this method we run as much error prone operations as possible, creating the necessary entities (runtime, TLS config, TCP listener) for the child if successful. We then use a Unix pipe to let the parent process know the result of these operations (in the form of an exit_code) as well as provide a relevant textual error if the operation was not successful.

fn daemonize_web_server() -> (/* server_resources */) {
    // Create pipes for parent-child communication
    let (mut exit_message_tx, exit_message_rx) = pipe().unwrap();
    let (mut exit_status_tx, mut exit_status_rx) = pipe().unwrap();
    
    let daemonization_outcome = daemonize::Daemonize::new()
        .privileged_action(move || -> Result</* server_resources */, String> {
            // Perform error-prone operations before full daemonization:
            // - Create async runtime
            // - Load TLS certificates
            // - Bind TCP listener
            // Convert all errors to strings for IPC
        })
        .execute();
        
    match daemonization_outcome {
        Outcome::Parent(Ok(parent)) => {
            // Parent waits for child's initialization result
            let mut buf = [0; 1];
            exit_status_rx.read_exact(&mut buf);
            let exit_status = buf[0] as i32;
            
            // Read error message if any
            let mut message = String::new();
            let mut reader = BufReader::new(exit_message_rx);
            reader.read_line(&mut message);
            
            // Display result and exit with child's status
            if exit_status == 0 {
                println!("{}", message.trim());
            } else {
                eprintln!("{}", message.trim());
            }
            std::process::exit(exit_status);
        },
        Outcome::Child(Ok(child)) => match child.privileged_action_result {
            Ok(server_resources) => {
                // Success: notify parent and continue as daemon
                exit_status_tx.write_all(&[0]);
                writeln!(exit_message_tx, "Web Server started");
                server_resources
            },
            Err(error) => {
                // Failure: send error to parent and exit
                exit_status_tx.write_all(&[2]);
                writeln!(exit_message_tx, "{}", error);
                std::process::exit(2);
            },
        },
    }
}

Client side technologies

The client-side includes a fully featured terminal: the excellent xterm.js. xterm.js has been around for many years and is used among other places as the terminal emulator in VSCode. It is battle tested and feature rich. xterm.js performs most of the work on the client side (terminal rendering and user input), but we had to write some custom integrations to make it fit with our use-case.

A prime example: at the time of writing xterm.js does not support mouse AnyEvent tracking, which Zellij heavily relies on. In this terminal mouse mode, mouse motions are sent to the terminal along with other events (eg. press and release). To work around this, we listen to the browser “mousemove” event, translate it to an ANSI instruction and send it to the server. Since mouse AnyEvent works with columns/rows rather than coordinates, we use an internal xterm.js function to make the translation. Not ideal, but works in a pinch.

terminal_element.addEventListener("mousemove", function (event) {
    if (event.buttons == 0) {
        // this means no mouse buttons are pressed and this is just a mouse movement
        let { col, row } = term._core._mouseService.getMouseReportCoords(
            event,
            terminal_element
        );
        if (prev_col != col || prev_row != row) {
            sendFunction(`\x1b[<35;${col + 1};${row + 1}M`);
        }
        prev_col = col;
        prev_row = row;
    }
});

Another example involves window titles. When working with a standard dedicated terminal application, it is often the case where the window title reflects the current working directory or the program name. This is done with an ANSI instruction called “OSC 0”. You can try it yourself: echo -e "\033]0;my awesome title\007" (note: you might want to add an && sleep 10 afterwards because some shell prompts will change it back immediately). Zellij uses this instruction to display both the session name and the title of the currently focused pane to the terminal. We would of course like this to also be the case for the browser - and more crucially, to the browser tab. To make this work, we intercept this instruction in the rendered bytes with a regex, changing the document.title to match:

wsTerminal.onmessage = function (event) {
    // ...
    let data = event.data;
    const titleRegex = /\x1b\]0;([^\x07\x1b]*?)(?:\x07|\x1b\\)/g;
    let match;
    while ((match = titleRegex.exec(data)) !== null) {
        document.title = match[1];
    }
    // ...
};

Why not Typescript?

While our client-side implementation leans heavily toward simple no-dependency solutions, coming from a strongly typed language like Rust we appreciate the value of being able to define data types for our interfaces. Typescript could offer significant benefits for our use-case, especially with regards to the various websocket message types.

On the other hand, Typescript comes with the heavy tax of an extra build-step. A tax particularly significant in our case since otherwise our client-side code does not have a build step at all.

While we could have implemented the build-step as part of the Zellij build-system (using the excellent cargo xtask), we were wary of the extra build complexity (the usual “Are my changes failing because of issues in my code or in the build system?”)

We ultimately decided against it. Reasoning that since our client-side code is relatively small, the complexity of introducing another build-step is more error prone than not using types at all. Time will tell if we made the right choice, and we might revisit it again as our client-side code expands.

What’s next?

The Zellij web client is a small step forward in the larger scheme of things. It is a first-class interface into another medium.

In the upcoming Zellij versions I plan to expand the web interface to include native rendering of UI components. I plan to allow blending multiple Zellij sessions with full-fledged read/write permissions, blurring the distinction between different machines on the UI level. I plan to follow up with hosted solutions for multiplayer terminal sessions.

If you are interested, the best way to follow along is to become a Zellij user and help build upon this infrastructure. We’re just getting started.

Zellij is developed and maintained as a labor of love, but love does not pay the bills.

Zellij will always be free and open-source. Zellij will never contain ads or collect your data.

If you are able, consider sponsoring the Zellij creator and lead developer with a recurring monthly donation. There are Zellij stickers in it for you!

Terminal sessions you can bookmark: Building Zellij’s web client