Skip to content

Offload Files

To avoid saturating the bandwidth of the server running soup, theres built-in support for offloading to dedicated webservers.

Queries for files will still be put against the instance, but it will only resolve which blob to pull and answer via a 307 Temporary Redirect with a link to a randomly selected webserver from the /api/cdn:list endpoint.

Since this offloading leverages normal http specifications, this even works out of the box with 3rd party clients, such as a simple curl.

Offloading Workflow
flowchart TD
    A[Client requests blob] --> B{Blob exists in DB?}
    B -->|No| C[404 Not Found]
    B -->|Yes| D{Redirects enabled?}
    D -->|No| E[200 Direct response]
    D -->|Yes| F{Online CDN servers?}
    F -->|None| E
    F -->|Yes| G[307 Redirect to CDN]
    G --> H[Client downloads from CDN]

Monitoring

Soup will only select a Mirror that it considers online. Servers can report their status via the /api/cdn/{id} endpoint via PATCH. This endpoint supports a status parameter which can be used to report errors of a cdn server

The following values are accepted:

  • 0: offline
  • 1: ok - normal operation
  • 2: warning
  • 3: error

Warning

You should never set the status to 0 by yourself. This can lead to undefined behaviour.

Every time a server sends a patch request to this endpoint, it will bump its last_seen time to the timestamp it checked in.

When a server does not bump its value in 40s, soup will consider that server offline and sets its status to 0, effectively removing it from the pool of which to serve files from.

Note

Keeping the interval close to 40s may lead to short bursts of offline status. It is recommended that you keep the refresh to 30s for 10s of buffer for slow networks or high pressure on the underlying database.

Webserver Config

The URLs soup redirects to are just https://$fqdn/$hash. Resolving how that is laid out in the actual file server is the job of the underlying webserver.

If you want a system that just-works™ we provide a docker container at: docker.io/ctxsystems/soup-cdn:latest-nginx.
This image is a nginx webserver that is already preconfigured with the necessary scripts to handle feedback.

If you want to adapt the config to your own needs or use a different webserver altogether, find some configs below:

nginx

nginx.conf
server {
    listen 80;
    server_name _;
    autoindex off;

    location ~ "^/([0-9a-f]{3})([0-9a-f]{61})$" { # (1)!
        alias /data/blobs/$1/$1$2;
        add_header Cache-Control "public, max-age=31536000, immutable";  # (2)!
        add_header X-Content-Type-Options "nosniff";
        default_type application/octet-stream;
    }

    location = /healthz {
        access_log off;
        return 200 "ok\n";
        add_header Content-Type text/plain;
    }

    location / {
        return 404;
    }
}
  1. Even though the link soup redirects to is /${hash}, it is recommended to store the blobs on dist via subfolders by taking their prefix of 3 bytes and putting corresponding files in there. This will require regex matching on the webserver instance, but since we expect high-bandwith traffic (downloading files) instead of high impact (lots of requests), this is neglible.

  2. Since hashes never change their content, we can tell proxies between (or clients) that they can cache this aggressively

apache

For the following config the following apache modules need to be enabled:

  • mod_alias
  • mod_headers
  • mod_rewrite
apache.conf
<VirtualHost *:80>
    DocumentRoot "/nonexistent"
    Options -Indexes

    AliasMatch "^/([0-9a-f]{3})([0-9a-f]{61})$" "/data/blobs/$1/$1$2"

    <Directory "/data/blobs">
        Require all granted
        Options -Indexes
    </Directory>

    <LocationMatch "^/[0-9a-f]{64}$"> # (1)!
        Header set Cache-Control "public, max-age=31536000, immutable" 
        Header set X-Content-Type-Options "nosniff"
        ForceType application/octet-stream
    </LocationMatch>

    # healthz route # (2)!
    Alias "/healthz" "/srv/healthz/ok" 

    <Directory "/srv/healthz">
        Require all granted
    </Directory>

    <Location "/healthz">
        Header set Content-Type "text/plain"
    </Location>

    # Suppress access logging for /healthz (equivalent to access_log off)
    SetEnvIf Request_URI "^/healthz$" no_log
    CustomLog ${APACHE_LOG_DIR}/access.log combined env=!no_log
</VirtualHost>
  1. Since hashes never change their content, we can tell proxies between (or clients) that they can cache this aggressively

  2. apache cannot return inline strings such as nginx can, so we will have to do with a static file containing "ok" somewhere inside the http directory.

IIS (todo)

Note

Add instructions for IIS here