ArchiveBox •
Create your own archival backups of web pages and internet media.
Overview
ArchiveBox lets you create your own archival backups of web pages and internet media.
Tryout an ArchiveBox demo.
- Archive URLs one at a time or in batches into redundant formats.
- Send URLs to ArchiveBox through a bookmarklet or a browser extension.
- Schedule periodic backups and optionally share them with archive.org.
Media
Screenshots
ArchiveBox v0.7.3
Setup & Configuration
We need to install the service through Portainer and configure any necessary settings.
Preparation
There are some things we need to do in preparation to install this service.
Block ads in your archival copies by installing AdGuard Home.
Volumes
Persistent Data
This is where the service will store its own application data and ensures we can quickly update the service image.
Ensure your user has permissions to access the folder
Environment
PUID
This is the numeric ID of the user account on Debian. If you are unsure, open a terminal and run:
id -u
PGID
This is the numeric ID of the user account's group on Debian. If you are unsure, open a terminal and run:
id -g
ADMIN_USERNAME
This is the username used to create the admin account on the first run.
This should be removed from the Compose configuration after the first run.
ALLOWED_HOSTS
This can be used to restrict traffic to specific domains, like archivebox.example.com.
Setting this to '*' allows all incoming traffic. This should be set if ArchiveBox is being connected to the internet using SWAG.
Recommended: *
PUBLIC_INDEX
Configures whether your ArchiveBox index of websites can be anonymously accessed by the public.
Recommended: False
PUBLIC_SNAPSHOTS
Configures whether your ArchiveBox snapshots can be anonymously accessed by the public.
Recommended: False
PUBLIC_ADD_VIEW
Configures whether your ArchiveBox can have archival backups requested anonymously by the public.
Recommended: False
MEDIA_MAX_SIZE
This is the maximum allowed size for each individual media file archive
Recommended: 750m
TIMEOUT
This is the maximum amount of time ArchiveBox will spend trying to archive a URL. This can be increased if you encounter frequent timeout errors.
Recommended: 60
CHECK_SSL_VALIDITY
This verifies the SSL certificate used for secure connections before downloading an archive. This can ensure that a website is who they say they are and do not contain malicious content.
Recommended: True
SAVE_ARCHIVE_DOT_ORG
This is whether ArchiveBox will request for archive.org to backup URLs by default
Recommended: True
Passwords
Keep these securely stored in a password manager, such as VaultWarden.
ADMIN_PASSWORD
This is the password for the administrator account that will be used within the web interface.
It is important to use secure passphrase that is easy-to-remember.
Installation
The service can be installed through the Portainer web interface.
Learn about creating a new stack.
Docker Compose
Use the following code to install the service:
---
services:
archivebox:
image: archivebox/archivebox:latest
ports:
- 8820:8000
network_mode: bridge
restart: unless-stopped
volumes:
# Persistent Data
- /srv/archivebox:/data
environment:
- ADMIN_USERNAME=
- ADMIN_PASSWORD=
- ALLOWED_HOSTS=
- PUBLIC_INDEX=False
- PUBLIC_SNAPSHOTS=False
- PUBLIC_ADD_VIEW=False
- PUID=1000
- PGID=1000
- MEDIA_MAX_SIZE=750m
- TIMEOUT=60
- CHECK_SSL_VALIDITY=True
- SAVE_ARCHIVE_DOT_ORG=True
Updating
Re-Deploy the Stack
This service has been optimized for running in Docker.
This allows you to re-deploy the stack through Portainer to download the latest updates.
User Manual
Development
This software is released under the MIT license.
You can learn more about how to contribute to ArchiveBox through their GitHub.
The development team also accepts donations.