Offload Files¶
To avoid saturating the bandwidth of the server running soup, theres built-in support for offloading to dedicated webservers.
Queries for files will still be put against the instance, but it will only resolve
which blob to pull and answer via a 307 Temporary Redirect with a link
to a randomly selected webserver from the /api/cdn:list endpoint.
Since this offloading leverages normal http specifications, this even works out of the box with 3rd party clients, such as a simple curl.
Offloading Workflow
flowchart TD
A[Client requests blob] --> B{Blob exists in DB?}
B -->|No| C[404 Not Found]
B -->|Yes| D{Redirects enabled?}
D -->|No| E[200 Direct response]
D -->|Yes| F{Online CDN servers?}
F -->|None| E
F -->|Yes| G[307 Redirect to CDN]
G --> H[Client downloads from CDN]
Monitoring¶
Soup will only select a Mirror that it considers online. Servers can report their status via the
/api/cdn/{id} endpoint via PATCH.
This endpoint supports a status parameter which can be used to report errors of a cdn server
The following values are accepted:
0: offline1: ok - normal operation2: warning3: error
Warning
You should never set the status to 0 by yourself. This can lead to undefined behaviour.
Every time a server sends a patch request to this endpoint, it will bump its last_seen time to the timestamp it checked in.
When a server does not bump its value in 40s, soup will consider that server offline and sets its status to 0, effectively removing it from the pool of which to serve files from.
Note
Keeping the interval close to 40s may lead to short bursts of offline status. It is recommended that you keep the refresh to 30s for 10s of buffer for slow networks or high pressure on the underlying database.
Webserver Config¶
The URLs soup redirects to are just https://$fqdn/$hash.
Resolving how that is laid out in the actual file server is the job of the underlying webserver.
If you want a system that just-works™ we provide a docker container at:
docker.io/ctxsystems/soup-cdn:latest-nginx
This image is a nginx webserver that is already preconfigured with the necessary scripts to handle
feedback.
If you want to adapt the config to your own needs or use a different webserver altogether, find some configs below:
nginx¶
server {
listen 80;
server_name _;
autoindex off;
location ~ "^/([0-9a-f]{3})([0-9a-f]{61})$" { # (1)!
alias /data/blobs/$1/$1$2;
add_header Cache-Control "public, max-age=31536000, immutable"; # (2)!
add_header X-Content-Type-Options "nosniff";
default_type application/octet-stream;
}
location = /healthz {
access_log off;
return 200 "ok\n";
add_header Content-Type text/plain;
}
location / {
return 404;
}
}
-
Even though the link soup redirects to is
/${hash}, it is recommended to store the blobs on dist via subfolders by taking their prefix of 3 bytes and putting corresponding files in there. This will require regex matching on the webserver instance, but since we expect high-bandwith traffic (downloading files) instead of high impact (lots of requests), this is neglible. -
Since hashes never change their content, we can tell proxies between (or clients) that they can cache this aggressively
apache¶
For the following config the following apache modules need to be enabled:
- mod_alias
- mod_headers
- mod_rewrite
<VirtualHost *:80>
DocumentRoot "/nonexistent"
Options -Indexes
AliasMatch "^/([0-9a-f]{3})([0-9a-f]{61})$" "/data/blobs/$1/$1$2"
<Directory "/data/blobs">
Require all granted
Options -Indexes
</Directory>
<LocationMatch "^/[0-9a-f]{64}$"> # (1)!
Header set Cache-Control "public, max-age=31536000, immutable"
Header set X-Content-Type-Options "nosniff"
ForceType application/octet-stream
</LocationMatch>
# healthz route # (2)!
Alias "/healthz" "/srv/healthz/ok"
<Directory "/srv/healthz">
Require all granted
</Directory>
<Location "/healthz">
Header set Content-Type "text/plain"
</Location>
# Suppress access logging for /healthz (equivalent to access_log off)
SetEnvIf Request_URI "^/healthz$" no_log
CustomLog ${APACHE_LOG_DIR}/access.log combined env=!no_log
</VirtualHost>
-
Since hashes never change their content, we can tell proxies between (or clients) that they can cache this aggressively
-
apache cannot return inline strings such as
nginxcan, so we will have to do with a static file containing "ok" somewhere inside the http directory.
IIS (todo)¶
Note
Add instructions for IIS here