Files
nix-config/docs/troubleshooting.md
2026-03-23 15:17:10 -05:00

5.5 KiB

Troubleshooting Guide

Common issues and solutions for this NixOS configuration.

Build Failures

nixos-rebuild switch fails

  1. Syntax error — the error message includes the file and line number. Common causes: missing ;, unmatched {, wrong type passed to an option.

  2. Evaluation error — read the full error trace. Often caused by a module option receiving the wrong type, or a missing cfg.enable guard.

  3. Fetch failure — a flake input or package source can't be downloaded. Check network connectivity, or try:

    nix flake update --update-input <input-name>
    
  4. Disk space — build sandbox fills up. Free space:

    sudo nix-collect-garbage -d
    df -h /nix
    

Assertion failures

If you see assertion failed, read the message field. For example:

error: assertion failed at …/nebula/sops.nix
  mjallen.services.nebula.secretsPrefix must be set

Set the required option in the system configuration.

Boot Issues

System won't boot after a config change

  1. At the boot menu, select a previous generation.
  2. Once booted, revert the change:
    cd /etc/nixos
    git revert HEAD
    sudo nixos-rebuild switch --flake .#$(hostname)
    

Booting from installation media to recover

# Mount the system (adjust device paths as needed)
sudo mount /dev/disk/by-label/nixos /mnt
sudo mount /dev/disk/by-label/boot /mnt/boot

# Chroot in
sudo nixos-enter --root /mnt
cd /etc/nixos

# Revert and rebuild
git revert HEAD
nixos-rebuild switch --flake .#hostname --install-bootloader

Lanzaboote / Secure Boot issues

If Secure Boot enrolment fails or the system won't verify:

# Check enrolled keys
sbctl status

# Re-enrol if needed (run as root)
sbctl enrol-keys --microsoft

# Sign bootloader files manually
sbctl sign -s /boot/EFI/systemd/systemd-bootx64.efi

SOPS / Secrets Issues

secret not found or permission denied at boot

  1. Verify the secret key path matches what's declared in the module's sops.nix.
  2. Check the secret exists in the SOPS file:
    sops --decrypt secrets/nas-secrets.yaml | grep "the-key"
    
  3. Check the owner/group set on the secret matches the service user.

Can't decrypt — wrong age key

The machine's age key is derived from /etc/ssh/ssh_host_ed25519_key. If the host key was regenerated, the age key changed and existing secrets can no longer be decrypted.

To fix: re-encrypt the secrets file with the new public key:

# Get the new public key
nix-shell -p ssh-to-age --run 'ssh-to-age < /etc/ssh/ssh_host_ed25519_key.pub'

# Update .sops.yaml with the new key, then:
sops updatekeys secrets/nas-secrets.yaml

Adding a new secret to an existing file

sops secrets/nas-secrets.yaml
# Editor opens with decrypted YAML — add your key, save, sops re-encrypts

Nebula VPN Issues

Peers can't connect

  1. Verify the lighthouse is reachable on its public address:
    nc -zvu mjallen.dev 4242
    
  2. Check the nebula service on both hosts:
    systemctl status nebula@jallen-nebula
    journalctl -u nebula@jallen-nebula -n 50
    
  3. Confirm the CA cert, host cert, and host key are all present and owned by the nebula-jallen-nebula user:
    ls -la /run/secrets/pi5/nebula/
    
  4. Verify the host cert was signed by the same CA as the other nodes:
    nebula-cert verify -ca ca.crt -crt host.crt
    

Certificate expired

Re-sign the host certificate:

nebula-cert sign -name "hostname" -ip "10.1.1.x/24" \
  -ca-crt ca.crt -ca-key ca.key \
  -out-crt host.crt -out-key host.key
# Update SOPS, rebuild

Impermanence Issues

Service fails because its data directory is missing after reboot

If a service stores state in a path that isn't in the persistence list, it will be wiped on reboot. Add it to impermanence.extraDirectories:

mjallen.impermanence.extraDirectories = [
  { directory = "/var/lib/my-service"; user = "my-service"; group = "my-service"; mode = "0750"; }
];

Then move the existing data if needed:

cp -a /var/lib/my-service /persist/var/lib/my-service

Flake Input Issues

Input update breaks a build

Roll back the specific input:

git checkout HEAD^ -- flake.lock

Or pin the input to a specific revision in flake.nix:

nixpkgs-unstable.url = "github:NixOS/nixpkgs/abc123def";

Service Issues

Service won't start

systemctl status <service>
journalctl -u <service> -n 100 --no-pager

Caddy reverse proxy not routing

  1. Check that reverseProxy.enable = true is set on the service.
  2. Verify the subdomain matches: reverseProxy.subdomain = "myapp"myapp.mjallen.dev.
  3. Check Caddy logs:
    journalctl -u caddy -n 50
    

PostgreSQL database missing for a service

If configureDb = true is set, the database is created automatically. If it's missing:

sudo -u postgres createdb my-service
sudo -u postgres psql -c "GRANT ALL ON DATABASE my-service TO my-service;"

Network Issues

Firewall blocking a service

Check which ports are open:

sudo nft list ruleset | grep accept

Add ports in the system config:

mjallen.network.firewall.allowedTCPPorts = [ 8080 ];

Or if using mkModule, set openFirewall = true (it's the default).

Getting Help