Our Blog

building a hipster-aware pi home server

Reading time ~53 min

The end of the year is getting closer, fast, so I figured it was a perfect time to talk about my side project from last year. In this post I want to walk you through setting up a Raspberry Pi as a home server with all of your services running in docker containers with valid Let’s Encrypt certificates without exposing the Pi to the Internet. We will be setting up a myriad of services such as Pihole, SmokePing and Grafana as well as WireGuard for remote access, so buckle up!

table of contents

This is a rather large post, so here is a ToC to try and navigate it.

In part 1, I’ll cover the base setup.

In part 2, I’ll cover customisations I have made.

pi hardware & software

I don’t want to write yet another Pi hardware / software setup how-to. If you need help with that, for reals, Google has plenty of resources. If you are looking for posts where you can use small embedded hardware like a Pi as an attack toolkit, have a look at Rogan‘s blog posts here and here.

What you will read in this post was all built on the Pi4 model B with 4GB of ram. Software-wise, the Pi was running Pi OS (formerly called Raspbian). I just dd‘d the OS image to an SD card and booted the Pi. Realistically nothing here is tied to a Pi; you can do all of this in a plain Linux VM too!

Ironically it wasn’t long after I built this that I broke the OS in a way that I had to reflash the SD card. The beauty of the setup you are about to read about is that should that happen to you, you can basically just git clone a repo and be up and running in a few moments again.

base software

I wanted as little as possible installed on the Pi’s host OS. If it could run in a docker container, that’s where I will try and run it. Of course some host software installation is inevitable, such as docker itself. I configured all services using docker-compose files so that would need installation as well. In short, apt install docker.io docker-compose and you should be good.

service architecture

Before diving into the details, let’s talk about the target architecture for a moment. Many of the services I wanted to use have web interfaces, ie. Smokeping, Grafana etc. Naturally these will be served from the Pi, but I don’t want them to be served under sub-directories on a web server. Instead, I’d like something like service.domain.com for each service I am hosting (aka virtual hosting). I also wanted these services to have valid TLS certificates (even though they were internal only), so I opted to use an external domain I own and add a subdomain called internal. This will require some DNS tweaking, but we will get to that in a bit. Yes, I could absolutely use something like nginx for this, but c’mon, its 2020.

arm docker images

When setting up services on a Pi, you may quickly notice that not all docker images on Dockerhub have ARM architecture support. This is not always a problem and sometimes finding one that supports ARM is just a little bit of googling away. But, when you run into it its good to know that you can use docker-compose to automatically build a local image based on a remote Dockerfile. For example, instead of specifying the image tag as image: leonjza/trauth which will pull the image from Dockerhub, you may also do this to build it locally:

trauth:
  build:
    context: https://github.com/leonjza/trauth.git
    dockerfile: Dockerfile
  image: trauth:local

services setup

I chose to use docker-compose files as it allows me to easily tie together relevant and related services to each other. It also makes it super simple to have always on services with the restart: tag. If your Pi lost power or rebooted for some reason, then the compose stacks will all start up again without manual intervention. My general workflow is to have a separate folder for each set of related services inside of a larger parent docker-apps folder.

smokeping

Before doing anything fancy, let’s start with a super simple service we may want to configure, like smokeping. With docker we can deploy apps with almost zero knowledge of the apps internals. The Dockerhub page for smokeping has a sample docker-compose schema, so just copy that to a file, say docker-apps/smokeping/docker-compose.yml and paste the contents. Next, run docker-compose up -d and you should have smokeping available on the IP address of your Pi. Of course, you may want to configure smokeping with some Targets, but lets just focus on the web service for now.

smokeping service running with port 80 bound to the host
plain HTTP smokeping service

netdata

Getting smokeping up and running was really easy. But now imagine you wanted to get another service running. Say something like netdata. Again, the Dockerhub page has a sample docker-compose schema we can use, so copy and paste that into a file say at docker-apps/netdata/docker-compose.yml. Finally, docker-compose up -d and netdata should be available on port 19999 on the IP address of the Pi.

netdata service running, bound to port 19999 on the host
plain HTTP netdata service on port 19999

Again, hardly any work or knowledge of the internals of these apps but we could get them up and running in no time. So far though, we have one app taking up port 80, and another taking up 19999. They don’t clash, yet.

pihole

The third service we will get getting up and running will be pihole. Just like smokeping and netdata, the pihole Dockerhub page also has a sample docker-compose schema. Like before, dump it to a file at say docker-apps/pihole/docker-compose.yml. The first thing that should catch your eye is the ports tag in the compose file.

ports needed for pihole.

As far as ports 53,67 and 443 go, they are not in use yet, so assigning those to the pihole service won’t be a problem. But port 80 is already in use by smokeping, so what are our options here? Well, we could:

  • Make the pihole web interface available on another port, say port 81 with a port mapping such as "81:80", meaning the host will forward port 81 traffic to port 80 on the container. (eew)
  • Add a web server such as nginx on the host OS to reverse proxy to the relevant docker services. This could be done either using subdirectories or maybe using virtual hosting to reach the relevant backend service. In this config you will have both the docker services ports and nginx open on the host. (also eew)
  • Add a dockerized web server to perform the same as the above, but instead of opening ports on the host for each service (like we have done so far) you can reach containers by their service name if they are in the same docker network as the nginx container (we have not configured this yet). This way you will only have to open port 80 on the host for the nginx service (less eew, but still eew)
  • Or…. use Traefik and docker networks (not eww, no, in fact, yum! very yum!)

To wrap up the pihole setup for now (we’ll get back to this), just comment out the port 80 mapping so that the line looks like this:

ports:
  - "53:53/tcp"
  - "53:53/udp"
  - "67:67/udp"
  #- "80:80/tcp"
  - "443:443/tcp"

The last thing before you do a docker-compose up will be to disable and stop the systemd-resolved daemon with systemctl disable --now systemd-resolved. This way your pihole won’t fail to start up because 53/udp is already in use.

pihole port mappings and a DNS lookup using it.

traefik

full disclosure: i am a massive traefik fan.

In case you have not heard of it, Traefik is a cloud native reverse proxy with some pretty neat features which we will be using extensively. Not only does it support HTTP, but TCP and UDP too. Once Traefik is configured, we can at any time add a new service with some labels which would result in Traefik automatically discovering the new service, configuring a web route to it and (optionally) setting up the TLS certificate for it. So in a few moments from issuing a docker-compose up command, we will be able to browse to a new-service.internal.domain.com domain and have it present a valid certificate!

Written in Golang, Traefik comes in many forms which include a standalone binary as well as a docker image. We are going to use the docker image here.

traefik – intro

Traefik is seriously powerful, and it shows in the myriad of ways you can configure it. The documentation site has a whole section dedicated to configuration discovery. The part we are going to be interested in is the docker configuration discovery feature (I want to encourage you to have a look at the many others that exist). In a nutshell, what docker configuration discovery means is that by attaching labels to our docker-compose services, we can instruct Traefik how to reach the web interfaces available in our service, both from an external route and internal service perspective. This after some once-off Traefik specific configuration is done to enable docker configuration discovery. If none of that makes sense yet, don’t worry, we are going to look at numerous examples and then some.

traefik – http concepts

There are various ways in which HTTP services can be configured with Traefik. Since we’re focusing on Docker, the following image from the Traefik documentation aims to visualise what some configuration setups may look like.

Traefik docker configuration example

Essentially, you need to know about two concepts. Routing, and services. A route connects to a service. A route can be a full hostname, a wildcard, a path, both, a request with a specific header, a request with a specific query string or a complex regular expression, but there are many more combinations possible. These routes need to be told that when matched, which Docker service (or services in a load balanced scenario) the request should be routed to. For more detailed examples of routing rules, see the documentation here.

For our purposes, we are just going to use simple host-based routes like smokeping.internal.domain.com. This in turn will be mapped directly to the target service which is smokeping in this case.

treafik – configuration

We have established that we want to use the Docker configuration discovery feature and that we are going to be running the Traefik container. So, get the example docker-compose schema from the Traefik website here and save it in a file say at docker-apps/traefik/docker-compose.yml. Notice how the Traefik service gets access to the Docker socket; this is how it knows whats going on.

With the example compose schema saved, we can now make some tweaks. To test, we will first just get the example compose file running. In our case, the smokeping service is already binding to port 80 (kinda what lead us down this rabbit hole in the first place, right), so for now just stop that service with docker-compose down in the docker-app/smokeping folder.

Next we will edit the example Traefik compose file, updating the example whoami service with a new hostname. This is the first place where you will see the route -> service binding in action. By default, the whoami service will instruct Traefik (using labels) that requests for whoami.localhost should be routed to the whoami service. It also tells Traefik that it will accept requests from the web entrypoint which is defined in the traefik service’s command stanza.

For our testing purposes, this service will not be on localhost but instead on our pi, so edit the Host() rule to whoami.local. Once done, docker-compose up -d !

If you having trouble with DNS at this stage, stop pihole and start the systemd-resolved again, for now.

example traefik docker-compose.yml with an update whomai hostname.

Next, add a quick hack to your computers local /etc/hosts file, pointing whoami.local to the IP address of your Pi. When that is done (and the Traefik stack is up), you should be able to browse to http://whoami.local/.

docker-compose ps output for the new traefik & whoami services
request for whoami.local, reverse proxid using Traefik

What you should see here is that the whoami service is fronted using Traefik (note how simple-service has no external port binding). In terms of Traefik basics, thats it! Let’s go and glue together the services we already set up in the previous steps to Traefik. You can safely remove the example whoami service from the docker-compose.yml file and cleanup the temp /etc/hosts hack.

traefik – routing smokeping

We have seen how the example whoami service was glued together, so lets do the same thing for smokeping. We will talk about DNS a little bit later in this guide, so for now add a new hack to your local /etc/hosts file for smokeping.internal.local, pointing to your Pi’s address.

The next thing we need to talk about is a little bit of docker networking. In order for Traefik to be able to reverse proxy to services, it needs to have network access to them. We have been creating services using separate docker-compose.yml files in different folders. When you start those services up without a network tag, docker-compose will go and create a new docker network specific for that stack. Usually, its the folder name with the _default suffix. In the following image you will see the netdata and traefik default networks.

docker networks created for service stacks

The problem with this configuration is that because these two services are using different docker networks, they are effectively “isolated” from each other and as a result, Traefik won’t be able to reach them. When we configured the whoami service earlier, it was in the same network as the traefik service (because they were in the same compose file) so Traefik had no problem reaching the appropriate internal service. To solve this, we have two options:

  • Merge all of our services into one, large docker-compose.yml file. (eew for me)
  • Use an external docker network to glue the appropriate services together (yum for me!)

I think this will in part be a personal preference thing. I chose separate compose files for a few reasons, but not all services I use have a web UI (or TCP/UDP service for that matter), and as a result has no business allowing connectivity between Traefik and the service. When you create a docker network and connect services to it, it means that you can reach other services by name. Say we have a network called bar, and services called foo and baz. They will be able to reach each other using the hostnames foo.bar and baz.bar. Pretty cool heh. Gluing Traefik to smokeping using this concept is pretty easy. We need to create the network, then specify the appropriate network tags for the services that should be in it.

So, create the docker network that you want to use for any service that Traefik should be able to reach with docker network create web. Here I called the docker network web.

creating a new docker network called web

Next, edit the Traefik stack, adding it to the web docker network.

traefik service stack attached to the web docker network

With the Traefik stack configured we can move on to the smokeping configuration again. Edit the stacks docker-compose.yml file and make the following changes:

  • Remove the ports tag. This does not have to be exposed to the world anymore. Traefik will take care of this.
  • Add the networks config (both the networks tag next to services as well as the networks tag inside the smokeping service definition).
  • Add the labels which will instruct Traefik to route requests coming in the web entrypoint (port 80) for smokeping.internal.local to the smokeping service. Traefik will resolve the service host for smokeping as smokeping.web automatically. So, those labels are:
labels:
  - "traefik.enable=true"
  - "traefik.http.routers.smokeping.rule=Host(`smokeping.internal.local`)"
  - "traefik.http.routers.smokeping.entrypoints=web"

Your configuration should look something like this now.

docker-compose for the smokeping service with Traefik labels

Thats it. Run docker-compose down and then docker-compose up -d, give everything a few seconds and browse to http://smokeping.internal.local.

smokeping available via traefik

Cool! We have removed the port 80 binding from the smokeping container and gave it to Traefik. Now, lets add the remainder of the services we configured previously.

traefik – routing netdata and pihole

In this section we will be doing pretty much exactly what we did for smokeping. Update our local hosts file temporarily with new hostnames; remove the service specific web port bindings (except for DNS for pihole, for now :P); add the appropriate labels to route our services and finally connect them to the web docker network.

For hostnames I went with netdata.internal.local and pihole.internal.local.

temporary /etc/hosts entries

For netdata, the new labels will be:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.netdata.rule=Host(`netdata.internal.local`)"
  - "traefik.http.routers.netdata.entrypoints=web"

For pihole we need to add one extra configuration label to specify the port where the web server lives (remember it also has DNS-related port mappings). Its labels will be:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.pihole.rule=Host(`pihole.internal.local`)"
  - "traefik.http.services.pihole.loadbalancer.server.port=80"
  - "traefik.http.routers.pihole.entrypoints=web"

All three web services should now be available, without anyone fighting over port 80. It also means that services on ports like 19999 are also accessible on port 80 now.

new services (netdata here) fronted using Traefik

There is one question to be asked here though. Both smokeping and pihole served their web interfaces using port 80, but netdata served it on port 19999. Yet, the configuration for netdata did not need the loadBalancer specification for the port like pihole did! That is because Traefik is smart enough to sniff the EXPOSE directive set for a Docker image and use that as the server port for a service. For example, netdata sets it here. It often happens that you have a service that does not do that, or has multiple ports (and Traefik gets it wrong). In those cases you will have to specify the target port using the loadBalancer label described in the docs here.

Also, notice how you have not touched your Traefik stack while configuring the others…

traefik – dashboard

Traefik has a web based dashboard that can give you a high-level overview of all of the active configuration. This includes entrypoints, routes and services. The default configuration enables this with the "--api.insecure=true" command entry making it accessible on port 8080 on the Pi. You can remove the 8080 port mapping and enable the dashboard using a few labels on the traefik service instead.

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.api.rule=Host(`traefik.internal.local`)"
  - "traefik.http.routers.api.service=api@internal"
  - "traefik.http.routers.api.entrypoints=web"

With the labels added, docker-compose down and docker-compose up -d and browse to the new service url.

traefik dashboard

Keep in mind, this dashboard has a lot of information about the services it’s fronting, so keep it secured! The way I did that was with a custom forwardauth implementation discussed later in this post.

networking

There are lots of improvements that can be made to our setup. Hacking away at our /etc/hosts file is not great, specially for the long run. Also, we are only serving our services using plaintext HTTP; another no no. Finally, I wanted to be able to reach services on my Pi from the Internet, without placing it on the Internet, so there is another opportunity for improvement.

DNS

It’s always DNS. Well, in this case maybe not, but like I said, /etc/hosts is not a great solution for our services. In fact, almost anything else will be better. In my case, I own an external domain that I decided I was going to use a sub domain for internal stuff. If you don’t have one, you can get some really, really cheap ones off sites like Namecheap. I also really like Cloudflare’s DNS management portal (which is free to use btw!), so I usually update the NS records for the domain to point to Cloudflare. Using Cloudflare is really a personal preference thing, and everything that we are going to do with Cloudflare here is possible on Namecheap’s DNS management console as well (and many others).

Anyways, the reason I mention this is because of a feature called wildcard DNS. Basically, having an entry such as *.internal.domain.com (using Cloudflare in my case) pointing to your Pi’s internal IP means you can spin up a service on that subdomain and it will resolve to your Pi. Think smokeping.internal.domain.com, netdata.internal.domain.com etc. All of them will resolve to the Pi where Traefik is running. Pretty cool. You could of course use other solutions such as unbound, dnsmasq or whatever. The point is just that wildcard DNS to your Pi is what you want in the end.

wildcard DNS entry for the internal subdomain, pointing to my pi

With a wildcard DNS configuration on a subdomain, you can remove those entries in your /etc/hosts file and update the Host() labels in your services to match the domain you have.

TLS

If you don’t pay close attention to this, this part could be indistinguishable from magic. Using Traefik, we can get automated TLS configurations using Let’s Encrypt. That means valid TLS certificates for web interfaces, on the internal network! What’s more, this can happen even though the Pi is on a home network behind a NAT. Yes, you read right.

A common (if not the most common) method for obtaining Let’s Encrypt TLS certificates is using what is known as the HTTP-01 challenge type. Basically, you prepare and serve using a web server a special token received from your ACME client that Let’s Encrypt will come and check from the Internet’s perspective. If it’s valid, you get a cert. As you can imagine this is a problematic challenge type given that our Pi is behind a NAT / firewall. Thankfully, another challenge type exists called DNS-01. This challenge type works on the premise that the special token is configured as a DNS TXT record instead that Let’s Encrypt checks. It also has the added bonus that it supports wildcard certificates (which is great!).

Here comes the kicker, Traefik has the ability to use many providers API’s (such as Cloudflare) to programatically configure DNS records, and by extension, can configure the DNS TXT record Let’s Encrypt will come and check. That is how we can get valid certs internally.

Configuration will depend on your provider, but for Cloudflare you will need to provide Traefik with some API credentials to use via environment variables. Since we have only been serving our web interfaces using HTTP so far, we also need to add some configuration to listen on TCP port 443. The changes we will make will be :

  • Create a new entryPoint for :443
  • Configure a certificate resolver to use the DNS challenge type. We will also be making use of a wildcard certificate; I don’t want actual hostnames popping up in certificate transparency logs!
  • Configure an acme.json file’s permissions where certificate private keys will be stored
  • Update service entrypoints to use the new web-secure entrypoint that we will create as well as the certificate resolver we added.

As far as the new entrypoint goes, that is pretty simple. Add a new command array item that says --entryPoints.web-secure.address=:443. Next, add a new entry to the ports array so that Traefik actually listens on it like this: "- 443:443". With that done we have both a web and web-secure entrypoint.

As for the certificate resolver (the term Traefik uses to know how to get certs), we need to add a few lines to the command array. Most of them are self explanatory, but if you need to, feel free to refer to the docs for more verbose explanations. The lines for the command array to add a new certificate resolver called primary would be:

command:
  ...  
  - --certificatesResolvers.primary.acme.email=you@domain.com
  - --certificatesResolvers.primary.acme.storage=acme.json
  - --certificatesResolvers.primary.acme.dnsChallenge=true
  - --certificatesResolvers.primary.acme.dnsChallenge.provider=cloudflare
  - --certificatesResolvers.primary.acme.dnsChallenge.delayBeforeCheck=0

For wildcard certificates, we need to add some labels to the traefik service telling it what the base domain should be as well as adding a SAN with *. Those configuration labels are:

labels:
  ...
  # wildcard certs
  - "traefik.http.routers.api.tls.domains[0].main=internal.domain.com"
  - "traefik.http.routers.api.tls.domains[0].sans=*.internal.domain.com"

The acme.json file is a special file that needs to be secured as it contains the private keys for all of the certificates that get generated. Just like an SSH key, Traefik will fail if the permissions for acme.json is not set to be 600 (-rw). It is a good idea to make this file available regardless of container restarts, so we will mount it as a volume. First, create the file called acme.json and change its permissions:

leonjza@ubuntu:~/docker-apps/traefik$ touch acme.json
leonjza@ubuntu:~/docker-apps/traefik$ chmod 600 acme.json

Next we will mount that new acme.json file into the Traefik container with another entry to the volumes array that reads: "./acme.json:/acme.json". Once that is done, do the docker-compose down and docker-compose up -d sequence again, and watch the services’ logs for any errors.

Finally, we will update our services to make use of the new HTTPS configuration. With Traefik configured, this is all you will need to configure in the future for your services. The only changes needed will be to update the entrypoint from web to web-secure and to add the certResolver configuration. Once thats done, restart the service stack and wait a bit for the TLS configuration magic to happen!

labels after enabling TLS configuration
pihole, running in docker, proxied via traefik with a valid wildcard tls certificate on my home network

tcp & udp support

All of our web services are now fronted with Traefik with fancy domain names and valid TLS certificates, but Traefik also supports TCP and UDP. In the case of our Pihole installation that listens on UDP and TCP port 53 to serve DNS lookups, this means we can move that to Traefik and remove the host port mapping the service uses. This makes things nice and neat from a configuration perspective as we will only be exposing ports from Traefik now, and not from arbitrary services. The setup is pretty simple too.

  • Create port mappings for TCP and UDP 53 in the traefik service
  • Create new traefik entrypoints using the command line configuration
  • Create TCP and UDP service labels to bind incoming connections to pihole

There are some subtle differences in how TCP and UDP connections are handled. It is possible to use SNI (implying the use of TLS and by extension valid certs :P) to route to different TCP services using the same entrypoint in Traefik. Pretty much just how HTTP routes work. However, UDP services do not support this. Instead, an entry point (aka, a UDP port) can only be mapped to a single service. For us this is not a problem; pihole will be the only service using UDP and TCP port 53. As far as labels go, these are the ones you will use on the pihole service after configuring the necessary entrypoints on the traefik service:

labels:
  ...
  # 53/udp
  - "traefik.udp.routers.53-udp.entrypoints=53-udp"
  - "traefik.udp.routers.53-udp.service=pihole-53-udp"
  - "traefik.udp.services.pihole-53-udp.loadbalancer.server.port=53"
  # 53/tcp
  - "traefik.tcp.routers.53-tcp.rule=HostSNI(`*`)"
  - "traefik.tcp.routers.53-tcp.entrypoints=53-tcp"
  - "traefik.tcp.routers.53-tcp.service=pihole-53-tcp"
  - "traefik.tcp.services.pihole-53-tcp.loadbalancer.server.port=53"

In these labels we are specifying new traefik.udp and traefik.tcp routers and services. The entrypoints match the entrypoint names we give in the traefik service. Take note of the TCP router having a HostSNI() rule with a wildcard. This means that any connection coming in on port 53/tcp will match this rule, routing it towards pihole.

With this configuration applied (restarting the relevant services), you should see some new TCP and UDP services pop up in your Traefik dashboard.

traefik dashboard with new udp and tcp services
details for the new udp router, forwarding 53/udp to the pihole service

configuration backups

The entirety of our setup lives in the docker-apps folder we have been using. The majority of the setup is also just a bunch of docker-compose.yml files. Depending on the services you use some will also have external configuration mapped to containers.

So, for configuration backups I use a private git repository hosted on Github to store everything. In each folder I use a .gitignore file to exclude stuff from git (such as databases and other stuff I don’t want to store in git), but essentially all of the changes I make get pushed to git. This way, should disaster strike (which it has), I can clone and up the services again with minimal work.

configuration backup in git

base setup summary

Wow, you read all the way up to here? Nice :P These are really just the basic building blocks for how I built the setup. To recap, here are some of the important take aways I think you should get from this:

  • Docker (and by extension docker-compose) is an incredibly powerful way to orchestrate small “stacks” of services.
  • Traefik makes it easy to perform configuration discovery of your Docker services having only applied some labels.
  • We can “front” / “forward” traffic from the outside world into the Docker world via Traefik, making it the only container that exposes ports to the host. Everything else can be inter-container communications.
  • Having valid TLS certificates, even on an internal network, really should be the norm. With wildcard support you don’t have to worry about internal hostnames popping up in Certificate Transparency logs either.
  • Git is great for config backups!

customisations

From here on onwards I am going to show some of the extra customised configurations, setup and code I wrote. Most of these are specific to my use case, but hopefully you can get some new ideas, or learn a trick or two. Service wise, everything will follow the same pattern with Traefik fronting it where it makes sense.

forwardauth

Many services that have web interfaces show statistics and other things I don’t really like having open without authentication. The Traefik dashboard and netdata are great examples of this. I wanted to secure these somehow, and Traefik makes this easy. Traefik has the ability to apply “middleware” to routes that make it possible to do a a whole bunch of crazy things like modifying requests, mirroring requests, applying compression to responses and many many more. From a security perspective, there is the BasicAuth middleware where we can with a single label apply an auth requirement for an HTTP route. At first glance this may sound perfect, but when you are dealing with many, many services, this quickly becomes a messy copy/paste job adding the label where you want the auth. Depending on your browser, the handling of basic auth may also not be as predictable, meaning you will often need to re-authenticate. An alternative middleware exists and it’s called ForwardAuth.

ForwardAuth is pretty cool. Basically, once configured incoming requests will first check in with an AuthServer if the request should continue or not. The AuthServer that ForwardAuth checks in with can be anything, as long as it returns an HTTP 200 to indicate to Traefik that the request should continue. Any other response will be bubbled up back to the client. The Traefik ForwardAuth docs has a great diagram illustrating this flow.

authforward.png (2420×639)
traefik forwardauth documentation diagram

There are a few existing ForwardAuth compatible AuthServer projects on Github, but almost all of them implement some sort of OAuth flow with identity providers such as Google. I did not want that. Instead, I liked the idea of just having some basic auth in front of my web services, and decided to build my own.

This resulted in me building trauth, a simple basic authentication ForwardAuth compatible AuthServer. Michael also built something similar that handles authentication using a web form, so go check that project out if you prefer that over a basic auth prompt!

The cool thing with something like trauth is that if you configure it to protect multiple web services, you only need to authenticate to one service and you will be authenticated to all. This is because in the basic auth response a cookie scoped for your domain gets set in the browser. So, when trauth sees this and the cookie is valid, it will respond with an HTTP 200. In my experience, for the most part once you have authenticated you mostly forget it exists unless your browser changes or the authentication cookie expires.

As far as setup goes, I added a trauth service to my existing traefik setup like this (after preparing the htpass file).

trauth added to the treafik docker-compose file

Notice how the services has a build tag. I mentioned this earlier in the post, but that is because I don’t have an ARM image on Dockerhub for it. With the build tag docker-compose will build the image from the Github repository for you.

The next part is to add a label to the traefik service, telling it to setup a new middleware called trauth of type forwardauth, and specifying where the service lives.

trauth forwardauth middleware setup

Here my AuthServer is http://trauth.web:8080/ because trauth is the service name connected to the web docker network. The other cool thing about this setup is that you don’t have to expose the AuthServer to the world (with labels), but instead can leave it isolated to the docker network. Traefik will be the only service calling it, forwarding responses as needed.

Finally, to have services use this new middleware (effectively protecting them with basic auth), add the following label "traefik.http.routers.pihole.middlewares=trauth" and restart the stack.

basic auth prompt because of a missing/invalid trauth cookie
trauth cookie set post basic auth meaning other services wont prompt for basic auth again

I use the trauth middleware for all of the exposed web services, regardless if the service itself has its own authentication too.

telegraf, influxdb & grafana

I don’t know about you, but I love graphs. A lot. Smokeping was the very first service I configured so that I could see what my Internet connections’ quality was like over time.

take a guess when I switched ISP’s, and why :P

I had come across Grafana at clients before, but never really understood what made up those amazing graphs. So, in an effort to better understand those (and get some graphs of my own), I went down a deep rabbit hole of figuring it out. I don’t think I should dive into too much detail, but rather quickly go over the parts that make up what is called the TIG stack, and then dive into the setup.

The TIG stack is the collection of Telegraf, InfluxDB and Grafana. Telegraf is a data collection agent (with an impressive amount of support for various things like SNMP, ICMP etc.) that saves results into InfluxDB which is a time series database (with a really simple protocol!). Finally, Grafana is the graphing UI that can query many databases, including InfixDB. From this data, Grafana can visualise data in some incredible ways. Anything from system performance metrics, to custom stuff you cook up yourself is possible!

To no-ones surprise, all of the software that make up the TIG stack can run in Docker containers, and by extension can be glued together with a docker-compose file. Each image requires its own set of configuration, eg. InfluxDB wants administrator credentials and so does Grafana. Telegraf on the other hand needs a configuration file which you can generate once-off and then mount into the container. I am not going to go into the details of all of those as they are pretty well documented on each project’s Dockerhub page.

What we will dive into a bit is the networking setup for the TIG stack. From the outside, only Grafana needs to be exposed to the world. Using Traefik and the setup described earlier in this doc, that is exactly how I did it. What is different though is that I opted to add another docker network called stats in addition to the existing web network. The idea being that any of the services inside of docker that need to talk to the TIG stack (either to fetch or push stats via Telegraf or directly to InfluxDB) can do so via the stats network. Web related stuff that need to be accessed from Traefik can do so via the web network. This made it quite easy for me to distinguish the different services, and at a glance gives you an idea of where they fit in.

example influxdb service configuration in my TIG stack

For example, the influxdb service does not have to be exposed to the outside world via Traefik, so it is not part of the web network. However, if something wanted to POST data to the InfluxDB HTTP API, that service will have to also be attached to the stats network to reach InfluxDB. Anyways, with the stack up, I can reach the Grafana instance running on the Pi from the outside.

grafanaaaaaaa

Out of the box Grafana is not really useful, so lets change that!

tig – traefik metrics

Traefik can send usage metrics to an InfluxDB database. The configuration is quite well documented as well. You basically just have to enable metrics submissions, tell Traefik where the InfluxDB instance lives, give it some credentials and then restart the stack. In the command array of the traefik service, I had to add these entries (we’ll dive into the details in a moment):

  - --metrics.influxdb=true
  - --metrics.influxdb.address=http://influxdb.stats:8086
  - --metrics.influxdb.protocol=http
  - --metrics.influxdb.pushInterval=30s
  - --metrics.influxdb.database=traefik
  - --metrics.influxdb.username=traefik
  - --metrics.influxdb.password=******

Notice how the location for InfluxDB is at influxdb.stats. That is because its available in the stats network only, which also means I had to attach the traefik service to the stats network.

networks:
  - web
  - stats

The next part I want to touch on are the credentials I provided. Those credentials are *not* the admin credentials I specified in the influxdb service, but rather a fresh set of credentials for a fresh database. I created those using an InfluxDB “iql’ script. If you noticed in the previous section where I showed the influxdb services’ schema, there was a volume mounted with ./iql-influxdb:/docker-entrypoint-initdb.d/. With this mount, when InfluxDB starts up it will read and process any .iql scripts that are in the folder. I have an init.iql script that configures databases & credentials for me, similar to this (stored in git):

CREATE DATABASE "traefik";
CREATE USER "traefik" WITH PASSWORD '*';
GRANT ALL ON "traefik" TO "traefik";

Alright, so now you know how Traefik can talk to InfluxDB, so restart all the relevant stacks and you should see a POST in the influxdb‘s service log coming from Traefik.

traefik metrics write event

With measurements in the InfluxDB database, the next part is the fun part. Visualising it! Before we get to that though, we need to connect InfluxDB and Grafana so that Grafana can actually query the data. In the Grafana UI you can do this by going to Settings -> Data Sources -> Add Data Source -> Choose InfluxDB -> give it a name & creds and save.

influxdb datasource configured in grafana

Grafana has many pre-built and community contributed dashboards that you can search for and import from here. For Traefik, I used this dashboard. To import an existing dashboard, you can just go to Manage Dashboards -> Import. You will be asked which datasource should be used, so select the one we just set up for Traefik. And voila!

traefik metrics visualised in grafana, backed by influxdb

In your Traefik dashboard, you should also see that metrics are now enabled.

treafik metrics configured to use influxdb

tig – ups metrics & alerts

Here in South Africa, we are plagued by what is known as “LoadShedding”. I’ll leave the politics of it for another day, but what it means for us is that depending on how much strain the local power utility provider is taking, we will get scheduled, rolling power outages for a few hours (depending on where you live in the country). So, having a backup power source to continue working is pretty important.

I wanted the ability to monitor statistics provided by my UPS and optionally, get alerts when the power goes out and comes back on again. Grafana has the ability to alert based on thresholds in a graph (we’ll get to that in a moment), so that solves a part of the problem. The next piece of the puzzle was to get statistics from my UPS into InfluxDB. More specifically, I wanted the results one could get from Network UPS Tools (NUT) populated into InfluxDB.

upsc battery metrics for a ups called eaton

I couldn’t find something that did this exactly how I wanted, so I went and built it instead. I found a Golang NUT client called go.nut and effectively wrote a NUT to InfluxDB translator called nutstat. Once configured, nutstat will poll NUT for data and POST the results to an InfluxDB database using the HTTP API. The readme file for nutstat contains an example docker-compose.yml file you can use to get the utility up and running here. Just like for the Traefik metrics, I added a fresh database and creds for nutstat in the InfluxDB .iql script.

As for Network UPS Tools itself, this was installed on the host OS, mostly because I was having trouble getting it running in Docker with a USB connected UPS.

Before we can setup some alerting we have to build a dashboard. Given that I populated a fresh DB with custom data, I also had to manually build the Grafana dashboard. There are thousands of guides online on how to do this, so I am not sure I can add much more. However, the result of the dashboard I have now looks like this:

ups monitoring dashboard. notice the last time I had loadshedding :(

Grafana can alert based on thresholds that you set in your graphs. Alerts can be sent via many providers which include email, slack etc.

grafana notification channel support

I quite like using Telegram given its support for bots, and with Grafana supporting Telegram it was perfect for me. After following the Telegram guide to configure a bot, you can setup the Telegram notification channel using the resultant Bot API key that you get.

Next, to create an alert you need to edit the panel you want to generate alerts on and browse to the alerts tab.

edit panel option from my ups dashboard

If you have not created an alert before you’ll get a large “Create Alert” button to press, otherwise you will get the edit alert options.

ups alert configuration to send an alert if the input voltage is below 150 for two minutes

Now, when the alert conditions are triggered (input voltage below 150 for 2 minutes for me), or when the state returns to “normal” (input voltage above 150), I receive Telegram messages.

trusty ‘ol scrolley bot telling me that the input voltage is below threshold

What you will notice is a really cool image of the graph being monitored as well. This was added in Grafana 7, but required a custom renderer to be configured. I added the following service to my TIG stack to support the external renderer feature, built specifically for a Pi.

grafana-renderer:
  image: adejong/grafana-image-renderer-pi:1.0.8-beta2
  restart: unless-stopped
  networks:
    - stats

Next, I just had to tell Grafana where the renderer service was via environment variables.

  # Render information
  - GF_RENDERING_SERVER_URL=http://grafana-renderer:8081/render
  - GF_RENDERING_CALLBACK_URL=http://grafana:3000/

remote access

I did not want to expose the Pi or any of its services directly on the Internet. But, I figured it would be great if I could reach it from the Internet to check out some of those graphs, or if I needed shell access on the Pi itself. So, I opted to configure some WireGuard tunnels to a cloud VPS allowing me to connect to the VPS from anywhere and reach the services on the Pi. Obviously this implies that a tunnel from the Pi to the VPS also had to be configured.

The basic idea is that the tunnel from the Pi to the cloud VPS would allow access to my home IP range over the tunnel. So when my phone or laptop connects via WireGuard to the VPS, it can also reach the internal network at home. With the wildcard DNS configuration sitting at Cloudflare, DNS resolution should work fine too!

network diagram on how to reach services on the pi

WireGuard was also on the host OS and not in Docker. It is definitely possible to run it in a container, but I was more comfortable running this on the host OS.

static configurations

As if using dynamic configurations from Docker wasn’t enough, you can also define “static” config for Traefik. I used this to also front my home routers web interface with Traefik. Silly, but why not? To enable a static config, all you need to do is tell Traefik which file it should parse with --providers.file.filename=/static.yml in the command array, and then mount in a file from the outside to /static/yml. Next, add your configuration! I just use it to front my router, so its really simple.

http:
  services:
    gateway-service:
      loadBalancer:
        passHostHeader: false
        servers:
        - url: "http://<router ip>/"
  routers:
    gateway:
      tls:
        certResolver: primary
      rule: "Host(gateway.internal.domain.com)"
      service: gateway-service

automatic updates

So we have configured a boat load of services, but how do we keep them up to date? I sure as heck don’t want to log in every so often and do it manually, so instead I use something called pyouroboros. Using this image from Dockerhub, I have a docker-compose file that runs pyouroboros that will keep containers up to date. It supports Telegram integration meaning I get a notification every time a container has been updated. It also supports InfluxDB, so upgrade stats are stored there too!

With the docker-compose file keeping pyouroboros up, the last bit to do is to label containers you want to form part of the automatic update with "com.ouroboros.enable=true".

conclusion

That… was a lot to read. I had a lot of fun building this setup, and having run it for almost a year now I can also attest to its ease of maintainability. Overall I feel like the setup is pretty clean. The only service exposing ports externally is Traefik, and internally from a Docker perspective there is a small bit of segregation too! Having configuration changes stored in Git also goes a long way to figure out what you were thinking over time. It also a nice reference for future implementations.

I’m sure there are better ways to do some of the things here. For example, having separate compose files is not really necessary as you can separate stacks with ----. However, I have found the service specific restarts in docker-compose to be rather unpredictable, and when I do a docker-compose down, I don’t really want to bring everything down. YMMV. Another consideration here may be to stuff all of this on a small Kubernetes cluster, like k3s. You know, for the learning experience and extra hipster points.

I really hope you managed to pick up a trick or two here, and feel inspired to play with this stuff too. The possibilities are endless!