» How this site runs
This site had 3 different homes during the last 2 years. I’ll detail here how the different infrastructures looked like, and why I moved from one to another.
A Kubernetes at home
At the beginning, this website lived in a
k3s cluster that I hosted at home. In order for me to practice and develop my Kubernetes skills outside from work, I bought a little Intel NUC (a CHUWI CoreBox i5 with 8GB of memory and 256GB of storage), running an Ubuntu 20.04. It was plugged directly at my Internet box and I was doing some NAT, so the Internet could reach me.
I deployed an ArgoCD instance, and started doing GitOps for my own self-hosted apps. It was really cool to setup this and I learned a lot of things on the way. I used a monorepo approach, with each folder containing both the code of the app, and the deployment manifests.
With some GitHub actions, I was able to automate the deployements at every push, without having to connect to the server or to interfere with the deployment itself.
This was a really cool setup, but I was concerned about few things. You especially have to be really confident in what you’re doing when you are hosting something that is Internet-facing at home. If people can break and escape your applications in some way and enter your LAN, it is the beginning of the end.
I also noticed that when I disconnected all my devices from my local network (i.e. when I went on holidays), my ISP rebooted my Internet box to update its firmware. However, each time my box got stuck to the last reboot step, and all my services were down. Working as a site reliability engineer and despite being an adept Marginalia’s law, having an availability percentage lower than 95% for single pages applications was hurting my feelings.
So after almost 2 years with frequent service interruptions, I decided to move to the Cloud.
A Kubernetes in the Cloud
I decided to stick with Kubernetes for various reasons, the main one being that all my stack (ArgoCD, Kubernetes manifests etc.) was ready to go. For those of you who work with Cloud services, you probably know why I did not keep this solution. But we’ll talk about it in a minute.
I chosed the smallest VM (
PLAY2-PICO), with a single instance in order to minimize the costs. I then
terraform apply my files and started migrating my applications.
It took me approximatly 2 to 3 hours to migrate everything (ArgoCD, Grafana, …) and to have all my main applications up & running. So a pretty successful manoeuver. Until the end of the month.
I know using the Cloud (especially k8s and the managed solutions) can be expensive, that’s why I selected the smallest instance, with only one replica, thinking it will be cheap and match Scaleway’s estimated price calculator. But it did not, and it cost me almost 3x the cost I anticipated (nearly 2€ a day), almost only for Compute costs. In the end, the bill was not that high (~45€ for a month, 540€ a year), but it is way more than I’m ready to spend for some single page applications and a Grafana dashboard.
It was a pretty fun experience, but it was also time to find cheaper.
Back to the basics
Finally, I spun a single
PLAY2-PICO instance on Scaleway, with the most minimal configuration (1 core, 2GB RAM and 10GB storage), and started creating a Docker Compose to handle my apps.
But the problem is that managing Let’s Encrypt certificates with NGINX in Docker have always been pain in the arse for me, even with
certbot. I do not know why, but it has always been a chore, and I hate it. So it was time to try another reverse-proxy solution: Traefik.
Traefik was for a long time in my “stuff I want to play with” list, and I was not disappointed. After a quick look on Internet to learn the basics, I stumbled upon this really good tutorial by Guillaume Sainthillier. It basically contained everything I needed to get started.
To be honest, it was awesome. Managing TLS certificates is no longer an issue (hopefully in 2022!), I love the use of labels for configuration and services auto-discovery. As Traefik states on their website:
Attach labels to your containers and let Traefik do the rest!
But while I was following the tutorial above, one thing catched my eye:
Indeed, accessing the Docker API without any restriction can be an important security issue. If the Traefik container get pwned, an attacker can easily access the underlying host and basically escape from the container. A few solutions exists to mitigate this problem, such as using SSH or TLS instead of a UNIX socket to dialog with the Docker API. But for now, I decided to use another container,
socket-proxy, that will be in charge of proxifying requests to the Docker socket.
Having this allows us to restrict which operations can be done on the Docker API, and avoid having to mount it in Traefik. There are two things to modify: first edit the
docker-compose.yaml to create the new container:
# - /var/run/docker.sock:/var/run/docker.sock
# snip ...
And then indicate to Traefik where the Docker provider endpoint is located to:
endpoint = "tcp://socket-proxy:2375"
watch = true
exposedByDefault = false
network = "web"
CONTAINER: 1 environment variable indicates to the proxy that GET requests to
/containers/* are allowed. It is a good starting point, but I’ll try to find time to dig deeper into how to protect the Docker daemon socket.
Well, I guess I’ve said it all. Even if I like Kubernetes and what it brings to infrastructure management and reliability, it does not suit every project. In the end, having a “simple”
docker-compose.yaml is more than enough for what I want to do. Even having a small instance is enough to handle the load my apps have (which is close to
I might have to add more storage though.
In the future, I want to learn how Traefik works more in detail (how to fully leverage middlewares and observability), and maybe harden Docker API socket security. I’ll also keep an eye on my Scaleway bill, and depending on the amount, switch to another Cloud provider that is cheaper.
See ya o/