Use nginx and Lua module to remotely archive websites using ArchiveBox.
Install nginx
web-server and Lua module.
$ sudo apt install nginx libnginx-mod-http-lua
Disable default configuration.
$ sudo unlink /etc/nginx/sites-enabled/default
Create /etc/nginx/sites-available/archivebox
configuration file.
This is a very simple and naive solution using
POST
request, specific URL /archive_url
, secret token secret_token
and url
parameter. Remember to create and configure SSL certificate.server { listen 80; server_name _; root /srv/archivebox/output/; index index.html; location / { try_files $uri $uri/ =404; } location /archive/ { autoindex on; } location /archive_url { default_type text/plain; charset utf8; content_by_lua_block{ local method = ngx.var.request_method if method == "POST" then ngx.req.read_body() local args = ngx.req.get_post_args() if args["token"] == "secret_token" and args["url"] ~= nil then local url = string.gsub(args["url"], "%s+", '%%20') local exec = assert(io.popen("cd /srv/archivebox/; export $(grep -v '^#' etc/ArchiveBox.conf | xargs); echo " .. url .. " | /srv/archivebox/archive", 'r')) local output = assert(exec:read('*a')) exec:close() ngx.log(ngx.INFO, args["token"], output) ngx.say(output) end else ngx.status = 404 ngx.exit(404) end } } }
Enable this specific configuration.
$ sudo ln -s /etc/nginx/sites-available/archivebox /etc/nginx/sites-enabled/
Reload nginx
service.
$ sudo systemctl reload nginx
Use curl
to archive specific URL.
$ curl -X POST http://archivebox.example.org/archive_url -d 'token=secret_token&url=https://www.debian.org/'
[*] [2019-06-16 22:17:45] Parsing new links from output/sources/stdin-1560723465.txt... > Adding 1 new links to index (parsed import as Plain Text) [*] [2019-06-16 22:17:45] Saving main index files... √ output/index.json √ output/index.html [▶] [2019-06-16 22:17:45] Updating content for 1 pages in archive... [+] [2019-06-16 22:17:45] "https://www.debian.org/" https://www.debian.org/ > output/archive/1560723465 > title > favicon > wget > pdf > screenshot > dom > archive_org [√] [2019-06-16 22:17:53] Update of 1 pages complete (7.35 sec) - 0 links skipped - 1 links updated - 0 links had errors To view your archive, open: output/index.html [*] [2019-06-16 22:17:53] Saving main index files... √ output/index.json √ output/index.html
This is really cool. You can easily extend this configuration to support every possible action.