fixes and docs

This commit is contained in:
mjallen18
2026-03-23 15:17:10 -05:00
parent 2c0b26ced0
commit 23f29b6ca1
25 changed files with 1590 additions and 795 deletions

View File

@@ -1,213 +1,217 @@
# Troubleshooting Guide
This guide provides solutions for common issues that may arise when using this NixOS configuration.
Common issues and solutions for this NixOS configuration.
## System Issues
## Build Failures
### Failed System Build
### `nixos-rebuild switch` fails
**Problem**: `nixos-rebuild switch` fails with an error.
1. **Syntax error** — the error message includes the file and line number. Common causes: missing `;`, unmatched `{`, wrong type passed to an option.
**Solutions**:
2. **Evaluation error** — read the full error trace. Often caused by a module option receiving the wrong type, or a missing `cfg.enable` guard.
1. **Syntax Errors**:
- Check the error message for file and line number information
- Verify the syntax in the mentioned file
- Common issues include missing semicolons, curly braces, or mismatched quotes
2. **Missing Dependencies**:
- If the error mentions a missing package or dependency:
```
git pull # Update to the latest version
nix flake update # Update the flake inputs
```
3. **Conflicting Modules**:
- Look for modules that might be configuring the same options incompatibly
- Disable one of the conflicting modules or adjust their configurations
4. **Disk Space Issues**:
- Check available disk space with `df -h`
- Clear old generations: `sudo nix-collect-garbage -d`
### Boot Issues
**Problem**: System fails to boot after a configuration change.
**Solutions**:
1. **Boot into a Previous Generation**:
- At the boot menu, select an older generation
- Once booted, revert the problematic change:
```
cd /etc/nixos
git revert HEAD # Or edit the files directly
sudo nixos-rebuild switch
```
2. **Boot from Installation Media**:
- Boot from a NixOS installation media
- Mount your system:
```
sudo mount /dev/disk/by-label/nixos /mnt
sudo mount /dev/disk/by-label/boot /mnt/boot # If separate boot partition
```
- Chroot into your system:
```
sudo nixos-enter --root /mnt
cd /etc/nixos
git revert HEAD # Or edit the files directly
nixos-rebuild switch --install-bootloader
```
## Home Assistant Issues
### Home Assistant Fails to Start
**Problem**: Home Assistant service fails to start.
**Solutions**:
1. **Check Service Status**:
```
systemctl status home-assistant
journalctl -u home-assistant -n 100
3. **Fetch failure** — a flake input or package source can't be downloaded. Check network connectivity, or try:
```bash
nix flake update --update-input <input-name>
```
2. **Database Issues**:
- Check PostgreSQL is running: `systemctl status postgresql`
- Verify database connection settings in Home Assistant configuration
3. **Permission Issues**:
- Check ownership and permissions on config directory:
```
ls -la /var/lib/homeassistant
sudo chown -R hass:hass /var/lib/homeassistant
sudo chmod -R 750 /var/lib/homeassistant
```
4. **Custom Component Issues**:
- Try disabling custom components to isolate the issue:
- Edit `modules/nixos/homeassistant/services/homeassistant/default.nix`
- Comment out the `customComponents` section
- Rebuild: `sudo nixos-rebuild switch`
### Zigbee Device Connection Issues
**Problem**: Zigbee devices fail to connect or are unstable.
**Solutions**:
1. **Verify Device Path**:
- Check the Zigbee coordinator is properly detected:
```
ls -la /dev/ttyUSB*
```
- Update the device path if needed:
- Edit your system configuration
- Set `mjallen.services.home-assistant.zigbeeDevicePath` to the correct path
- Rebuild: `sudo nixos-rebuild switch`
2. **Interference Issues**:
- Move the Zigbee coordinator away from other wireless devices
- Try a USB extension cable to improve positioning
- Change Zigbee channel in Zigbee2MQTT configuration
3. **Reset Zigbee2MQTT**:
```
systemctl restart zigbee2mqtt
4. **Disk space** — build sandbox fills up. Free space:
```bash
sudo nix-collect-garbage -d
df -h /nix
```
### Automation Issues
### Assertion failures
**Problem**: Automations don't run as expected.
If you see `assertion failed`, read the `message` field. For example:
```
error: assertion failed at …/nebula/sops.nix
mjallen.services.nebula.secretsPrefix must be set
```
Set the required option in the system configuration.
**Solutions**:
## Boot Issues
1. **Check Automation Status**:
- In Home Assistant UI, verify the automation is enabled
- Check Home Assistant logs for automation execution errors
### System won't boot after a config change
2. **Entity Issues**:
- Verify entity IDs are correct
- Check if entities are available/connected
- Test direct service calls to verify entity control works
1. At the boot menu, select a previous generation.
2. Once booted, revert the change:
```bash
cd /etc/nixos
git revert HEAD
sudo nixos-rebuild switch --flake .#$(hostname)
```
3. **Trigger Issues**:
- Test the automation manually via Developer Tools > Services
- Use `automation.trigger` service with the automation's entity_id
### Booting from installation media to recover
## Flake Issues
```bash
# Mount the system (adjust device paths as needed)
sudo mount /dev/disk/by-label/nixos /mnt
sudo mount /dev/disk/by-label/boot /mnt/boot
### Flake Input Update Errors
# Chroot in
sudo nixos-enter --root /mnt
cd /etc/nixos
**Problem**: `nix flake update` fails or causes issues.
# Revert and rebuild
git revert HEAD
nixos-rebuild switch --flake .#hostname --install-bootloader
```
**Solutions**:
### Lanzaboote / Secure Boot issues
1. **Selective Updates**:
- Update specific inputs instead of all at once:
```
nix flake lock --update-input nixpkgs
```
If Secure Boot enrolment fails or the system won't verify:
2. **Rollback Flake Lock**:
- If an update causes issues, revert to previous flake.lock:
```
git checkout HEAD^ -- flake.lock
```
```bash
# Check enrolled keys
sbctl status
3. **Pin to Specific Revisions**:
- In `flake.nix`, pin problematic inputs to specific revisions:
```nix
nixpkgs-stable.url = "github:NixOS/nixpkgs/5233fd2ba76a3accb05f88b08917450363be8899";
```
# Re-enrol if needed (run as root)
sbctl enrol-keys --microsoft
## Secret Management Issues
# Sign bootloader files manually
sbctl sign -s /boot/EFI/systemd/systemd-bootx64.efi
```
### Sops Decryption Errors
## SOPS / Secrets Issues
**Problem**: Sops fails to decrypt secrets.
### `secret not found` or permission denied at boot
**Solutions**:
1. Verify the secret key path matches what's declared in the module's `sops.nix`.
2. Check the secret exists in the SOPS file:
```bash
sops --decrypt secrets/nas-secrets.yaml | grep "the-key"
```
3. Check the `owner`/`group` set on the secret matches the service user.
1. **Key Issues**:
- Verify your GPG key is available and unlocked
- Check `.sops.yaml` includes your key fingerprint
### Can't decrypt — wrong age key
2. **Permission Issues**:
- Check file permissions on secret files
- Make sure the user running `nixos-rebuild` has access to the GPG key
The machine's age key is derived from `/etc/ssh/ssh_host_ed25519_key`. If the host key was regenerated, the age key changed and existing secrets can no longer be decrypted.
To fix: re-encrypt the secrets file with the new public key:
```bash
# Get the new public key
nix-shell -p ssh-to-age --run 'ssh-to-age < /etc/ssh/ssh_host_ed25519_key.pub'
# Update .sops.yaml with the new key, then:
sops updatekeys secrets/nas-secrets.yaml
```
### Adding a new secret to an existing file
```bash
sops secrets/nas-secrets.yaml
# Editor opens with decrypted YAML — add your key, save, sops re-encrypts
```
## Nebula VPN Issues
### Peers can't connect
1. Verify the lighthouse is reachable on its public address:
```bash
nc -zvu mjallen.dev 4242
```
2. Check the nebula service on both hosts:
```bash
systemctl status nebula@jallen-nebula
journalctl -u nebula@jallen-nebula -n 50
```
3. Confirm the CA cert, host cert, and host key are all present and owned by the `nebula-jallen-nebula` user:
```bash
ls -la /run/secrets/pi5/nebula/
```
4. Verify the host cert was signed by the same CA as the other nodes:
```bash
nebula-cert verify -ca ca.crt -crt host.crt
```
### Certificate expired
Re-sign the host certificate:
```bash
nebula-cert sign -name "hostname" -ip "10.1.1.x/24" \
-ca-crt ca.crt -ca-key ca.key \
-out-crt host.crt -out-key host.key
# Update SOPS, rebuild
```
## Impermanence Issues
### Service fails because its data directory is missing after reboot
If a service stores state in a path that isn't in the persistence list, it will be wiped on reboot. Add it to `impermanence.extraDirectories`:
```nix
mjallen.impermanence.extraDirectories = [
{ directory = "/var/lib/my-service"; user = "my-service"; group = "my-service"; mode = "0750"; }
];
```
Then move the existing data if needed:
```bash
cp -a /var/lib/my-service /persist/var/lib/my-service
```
## Flake Input Issues
### Input update breaks a build
Roll back the specific input:
```bash
git checkout HEAD^ -- flake.lock
```
Or pin the input to a specific revision in `flake.nix`:
```nix
nixpkgs-unstable.url = "github:NixOS/nixpkgs/abc123def";
```
## Service Issues
### Service won't start
```bash
systemctl status <service>
journalctl -u <service> -n 100 --no-pager
```
### Caddy reverse proxy not routing
1. Check that `reverseProxy.enable = true` is set on the service.
2. Verify the subdomain matches: `reverseProxy.subdomain = "myapp"` → `myapp.mjallen.dev`.
3. Check Caddy logs:
```bash
journalctl -u caddy -n 50
```
### PostgreSQL database missing for a service
If `configureDb = true` is set, the database is created automatically. If it's missing:
```bash
sudo -u postgres createdb my-service
sudo -u postgres psql -c "GRANT ALL ON DATABASE my-service TO my-service;"
```
## Network Issues
### Firewall Blocks Services
### Firewall blocking a service
**Problem**: Services are not accessible due to firewall rules.
Check which ports are open:
```bash
sudo nft list ruleset | grep accept
```
**Solutions**:
Add ports in the system config:
```nix
mjallen.network.firewall.allowedTCPPorts = [ 8080 ];
```
1. **Check Firewall Status**:
```
sudo nix-shell -p iptables --run "iptables -L"
```
2. **Verify Firewall Configuration**:
- Check if ports are properly allowed in the configuration
- Add missing ports if necessary
3. **Temporary Disable Firewall** (for testing only):
```
sudo systemctl stop firewall
# After testing
sudo systemctl start firewall
```
Or if using `mkModule`, set `openFirewall = true` (it's the default).
## Getting Help
If you encounter an issue not covered in this guide:
1. Check the NixOS Wiki: https://nixos.wiki/
2. Search the NixOS Discourse forum: https://discourse.nixos.org/
3. Join the NixOS Matrix/Discord community for real-time help
4. File an issue in the repository if you believe you've found a bug
- NixOS manual: `nixos-help` or https://nixos.org/manual/nixos/stable/
- NixOS Wiki: https://nixos.wiki/
- NixOS Discourse: https://discourse.nixos.org/
- Nix package search: https://search.nixos.org/packages