fixes and docs
This commit is contained in:
@@ -1,213 +1,217 @@
|
||||
# Troubleshooting Guide
|
||||
|
||||
This guide provides solutions for common issues that may arise when using this NixOS configuration.
|
||||
Common issues and solutions for this NixOS configuration.
|
||||
|
||||
## System Issues
|
||||
## Build Failures
|
||||
|
||||
### Failed System Build
|
||||
### `nixos-rebuild switch` fails
|
||||
|
||||
**Problem**: `nixos-rebuild switch` fails with an error.
|
||||
1. **Syntax error** — the error message includes the file and line number. Common causes: missing `;`, unmatched `{`, wrong type passed to an option.
|
||||
|
||||
**Solutions**:
|
||||
2. **Evaluation error** — read the full error trace. Often caused by a module option receiving the wrong type, or a missing `cfg.enable` guard.
|
||||
|
||||
1. **Syntax Errors**:
|
||||
- Check the error message for file and line number information
|
||||
- Verify the syntax in the mentioned file
|
||||
- Common issues include missing semicolons, curly braces, or mismatched quotes
|
||||
|
||||
2. **Missing Dependencies**:
|
||||
- If the error mentions a missing package or dependency:
|
||||
```
|
||||
git pull # Update to the latest version
|
||||
nix flake update # Update the flake inputs
|
||||
```
|
||||
|
||||
3. **Conflicting Modules**:
|
||||
- Look for modules that might be configuring the same options incompatibly
|
||||
- Disable one of the conflicting modules or adjust their configurations
|
||||
|
||||
4. **Disk Space Issues**:
|
||||
- Check available disk space with `df -h`
|
||||
- Clear old generations: `sudo nix-collect-garbage -d`
|
||||
|
||||
### Boot Issues
|
||||
|
||||
**Problem**: System fails to boot after a configuration change.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Boot into a Previous Generation**:
|
||||
- At the boot menu, select an older generation
|
||||
- Once booted, revert the problematic change:
|
||||
```
|
||||
cd /etc/nixos
|
||||
git revert HEAD # Or edit the files directly
|
||||
sudo nixos-rebuild switch
|
||||
```
|
||||
|
||||
2. **Boot from Installation Media**:
|
||||
- Boot from a NixOS installation media
|
||||
- Mount your system:
|
||||
```
|
||||
sudo mount /dev/disk/by-label/nixos /mnt
|
||||
sudo mount /dev/disk/by-label/boot /mnt/boot # If separate boot partition
|
||||
```
|
||||
- Chroot into your system:
|
||||
```
|
||||
sudo nixos-enter --root /mnt
|
||||
cd /etc/nixos
|
||||
git revert HEAD # Or edit the files directly
|
||||
nixos-rebuild switch --install-bootloader
|
||||
```
|
||||
|
||||
## Home Assistant Issues
|
||||
|
||||
### Home Assistant Fails to Start
|
||||
|
||||
**Problem**: Home Assistant service fails to start.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Check Service Status**:
|
||||
```
|
||||
systemctl status home-assistant
|
||||
journalctl -u home-assistant -n 100
|
||||
3. **Fetch failure** — a flake input or package source can't be downloaded. Check network connectivity, or try:
|
||||
```bash
|
||||
nix flake update --update-input <input-name>
|
||||
```
|
||||
|
||||
2. **Database Issues**:
|
||||
- Check PostgreSQL is running: `systemctl status postgresql`
|
||||
- Verify database connection settings in Home Assistant configuration
|
||||
|
||||
3. **Permission Issues**:
|
||||
- Check ownership and permissions on config directory:
|
||||
```
|
||||
ls -la /var/lib/homeassistant
|
||||
sudo chown -R hass:hass /var/lib/homeassistant
|
||||
sudo chmod -R 750 /var/lib/homeassistant
|
||||
```
|
||||
|
||||
4. **Custom Component Issues**:
|
||||
- Try disabling custom components to isolate the issue:
|
||||
- Edit `modules/nixos/homeassistant/services/homeassistant/default.nix`
|
||||
- Comment out the `customComponents` section
|
||||
- Rebuild: `sudo nixos-rebuild switch`
|
||||
|
||||
### Zigbee Device Connection Issues
|
||||
|
||||
**Problem**: Zigbee devices fail to connect or are unstable.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Verify Device Path**:
|
||||
- Check the Zigbee coordinator is properly detected:
|
||||
```
|
||||
ls -la /dev/ttyUSB*
|
||||
```
|
||||
- Update the device path if needed:
|
||||
- Edit your system configuration
|
||||
- Set `mjallen.services.home-assistant.zigbeeDevicePath` to the correct path
|
||||
- Rebuild: `sudo nixos-rebuild switch`
|
||||
|
||||
2. **Interference Issues**:
|
||||
- Move the Zigbee coordinator away from other wireless devices
|
||||
- Try a USB extension cable to improve positioning
|
||||
- Change Zigbee channel in Zigbee2MQTT configuration
|
||||
|
||||
3. **Reset Zigbee2MQTT**:
|
||||
```
|
||||
systemctl restart zigbee2mqtt
|
||||
4. **Disk space** — build sandbox fills up. Free space:
|
||||
```bash
|
||||
sudo nix-collect-garbage -d
|
||||
df -h /nix
|
||||
```
|
||||
|
||||
### Automation Issues
|
||||
### Assertion failures
|
||||
|
||||
**Problem**: Automations don't run as expected.
|
||||
If you see `assertion failed`, read the `message` field. For example:
|
||||
```
|
||||
error: assertion failed at …/nebula/sops.nix
|
||||
mjallen.services.nebula.secretsPrefix must be set
|
||||
```
|
||||
Set the required option in the system configuration.
|
||||
|
||||
**Solutions**:
|
||||
## Boot Issues
|
||||
|
||||
1. **Check Automation Status**:
|
||||
- In Home Assistant UI, verify the automation is enabled
|
||||
- Check Home Assistant logs for automation execution errors
|
||||
### System won't boot after a config change
|
||||
|
||||
2. **Entity Issues**:
|
||||
- Verify entity IDs are correct
|
||||
- Check if entities are available/connected
|
||||
- Test direct service calls to verify entity control works
|
||||
1. At the boot menu, select a previous generation.
|
||||
2. Once booted, revert the change:
|
||||
```bash
|
||||
cd /etc/nixos
|
||||
git revert HEAD
|
||||
sudo nixos-rebuild switch --flake .#$(hostname)
|
||||
```
|
||||
|
||||
3. **Trigger Issues**:
|
||||
- Test the automation manually via Developer Tools > Services
|
||||
- Use `automation.trigger` service with the automation's entity_id
|
||||
### Booting from installation media to recover
|
||||
|
||||
## Flake Issues
|
||||
```bash
|
||||
# Mount the system (adjust device paths as needed)
|
||||
sudo mount /dev/disk/by-label/nixos /mnt
|
||||
sudo mount /dev/disk/by-label/boot /mnt/boot
|
||||
|
||||
### Flake Input Update Errors
|
||||
# Chroot in
|
||||
sudo nixos-enter --root /mnt
|
||||
cd /etc/nixos
|
||||
|
||||
**Problem**: `nix flake update` fails or causes issues.
|
||||
# Revert and rebuild
|
||||
git revert HEAD
|
||||
nixos-rebuild switch --flake .#hostname --install-bootloader
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
### Lanzaboote / Secure Boot issues
|
||||
|
||||
1. **Selective Updates**:
|
||||
- Update specific inputs instead of all at once:
|
||||
```
|
||||
nix flake lock --update-input nixpkgs
|
||||
```
|
||||
If Secure Boot enrolment fails or the system won't verify:
|
||||
|
||||
2. **Rollback Flake Lock**:
|
||||
- If an update causes issues, revert to previous flake.lock:
|
||||
```
|
||||
git checkout HEAD^ -- flake.lock
|
||||
```
|
||||
```bash
|
||||
# Check enrolled keys
|
||||
sbctl status
|
||||
|
||||
3. **Pin to Specific Revisions**:
|
||||
- In `flake.nix`, pin problematic inputs to specific revisions:
|
||||
```nix
|
||||
nixpkgs-stable.url = "github:NixOS/nixpkgs/5233fd2ba76a3accb05f88b08917450363be8899";
|
||||
```
|
||||
# Re-enrol if needed (run as root)
|
||||
sbctl enrol-keys --microsoft
|
||||
|
||||
## Secret Management Issues
|
||||
# Sign bootloader files manually
|
||||
sbctl sign -s /boot/EFI/systemd/systemd-bootx64.efi
|
||||
```
|
||||
|
||||
### Sops Decryption Errors
|
||||
## SOPS / Secrets Issues
|
||||
|
||||
**Problem**: Sops fails to decrypt secrets.
|
||||
### `secret not found` or permission denied at boot
|
||||
|
||||
**Solutions**:
|
||||
1. Verify the secret key path matches what's declared in the module's `sops.nix`.
|
||||
2. Check the secret exists in the SOPS file:
|
||||
```bash
|
||||
sops --decrypt secrets/nas-secrets.yaml | grep "the-key"
|
||||
```
|
||||
3. Check the `owner`/`group` set on the secret matches the service user.
|
||||
|
||||
1. **Key Issues**:
|
||||
- Verify your GPG key is available and unlocked
|
||||
- Check `.sops.yaml` includes your key fingerprint
|
||||
### Can't decrypt — wrong age key
|
||||
|
||||
2. **Permission Issues**:
|
||||
- Check file permissions on secret files
|
||||
- Make sure the user running `nixos-rebuild` has access to the GPG key
|
||||
The machine's age key is derived from `/etc/ssh/ssh_host_ed25519_key`. If the host key was regenerated, the age key changed and existing secrets can no longer be decrypted.
|
||||
|
||||
To fix: re-encrypt the secrets file with the new public key:
|
||||
```bash
|
||||
# Get the new public key
|
||||
nix-shell -p ssh-to-age --run 'ssh-to-age < /etc/ssh/ssh_host_ed25519_key.pub'
|
||||
|
||||
# Update .sops.yaml with the new key, then:
|
||||
sops updatekeys secrets/nas-secrets.yaml
|
||||
```
|
||||
|
||||
### Adding a new secret to an existing file
|
||||
|
||||
```bash
|
||||
sops secrets/nas-secrets.yaml
|
||||
# Editor opens with decrypted YAML — add your key, save, sops re-encrypts
|
||||
```
|
||||
|
||||
## Nebula VPN Issues
|
||||
|
||||
### Peers can't connect
|
||||
|
||||
1. Verify the lighthouse is reachable on its public address:
|
||||
```bash
|
||||
nc -zvu mjallen.dev 4242
|
||||
```
|
||||
2. Check the nebula service on both hosts:
|
||||
```bash
|
||||
systemctl status nebula@jallen-nebula
|
||||
journalctl -u nebula@jallen-nebula -n 50
|
||||
```
|
||||
3. Confirm the CA cert, host cert, and host key are all present and owned by the `nebula-jallen-nebula` user:
|
||||
```bash
|
||||
ls -la /run/secrets/pi5/nebula/
|
||||
```
|
||||
4. Verify the host cert was signed by the same CA as the other nodes:
|
||||
```bash
|
||||
nebula-cert verify -ca ca.crt -crt host.crt
|
||||
```
|
||||
|
||||
### Certificate expired
|
||||
|
||||
Re-sign the host certificate:
|
||||
```bash
|
||||
nebula-cert sign -name "hostname" -ip "10.1.1.x/24" \
|
||||
-ca-crt ca.crt -ca-key ca.key \
|
||||
-out-crt host.crt -out-key host.key
|
||||
# Update SOPS, rebuild
|
||||
```
|
||||
|
||||
## Impermanence Issues
|
||||
|
||||
### Service fails because its data directory is missing after reboot
|
||||
|
||||
If a service stores state in a path that isn't in the persistence list, it will be wiped on reboot. Add it to `impermanence.extraDirectories`:
|
||||
|
||||
```nix
|
||||
mjallen.impermanence.extraDirectories = [
|
||||
{ directory = "/var/lib/my-service"; user = "my-service"; group = "my-service"; mode = "0750"; }
|
||||
];
|
||||
```
|
||||
|
||||
Then move the existing data if needed:
|
||||
```bash
|
||||
cp -a /var/lib/my-service /persist/var/lib/my-service
|
||||
```
|
||||
|
||||
## Flake Input Issues
|
||||
|
||||
### Input update breaks a build
|
||||
|
||||
Roll back the specific input:
|
||||
```bash
|
||||
git checkout HEAD^ -- flake.lock
|
||||
```
|
||||
|
||||
Or pin the input to a specific revision in `flake.nix`:
|
||||
```nix
|
||||
nixpkgs-unstable.url = "github:NixOS/nixpkgs/abc123def";
|
||||
```
|
||||
|
||||
## Service Issues
|
||||
|
||||
### Service won't start
|
||||
|
||||
```bash
|
||||
systemctl status <service>
|
||||
journalctl -u <service> -n 100 --no-pager
|
||||
```
|
||||
|
||||
### Caddy reverse proxy not routing
|
||||
|
||||
1. Check that `reverseProxy.enable = true` is set on the service.
|
||||
2. Verify the subdomain matches: `reverseProxy.subdomain = "myapp"` → `myapp.mjallen.dev`.
|
||||
3. Check Caddy logs:
|
||||
```bash
|
||||
journalctl -u caddy -n 50
|
||||
```
|
||||
|
||||
### PostgreSQL database missing for a service
|
||||
|
||||
If `configureDb = true` is set, the database is created automatically. If it's missing:
|
||||
```bash
|
||||
sudo -u postgres createdb my-service
|
||||
sudo -u postgres psql -c "GRANT ALL ON DATABASE my-service TO my-service;"
|
||||
```
|
||||
|
||||
## Network Issues
|
||||
|
||||
### Firewall Blocks Services
|
||||
### Firewall blocking a service
|
||||
|
||||
**Problem**: Services are not accessible due to firewall rules.
|
||||
Check which ports are open:
|
||||
```bash
|
||||
sudo nft list ruleset | grep accept
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
Add ports in the system config:
|
||||
```nix
|
||||
mjallen.network.firewall.allowedTCPPorts = [ 8080 ];
|
||||
```
|
||||
|
||||
1. **Check Firewall Status**:
|
||||
```
|
||||
sudo nix-shell -p iptables --run "iptables -L"
|
||||
```
|
||||
|
||||
2. **Verify Firewall Configuration**:
|
||||
- Check if ports are properly allowed in the configuration
|
||||
- Add missing ports if necessary
|
||||
|
||||
3. **Temporary Disable Firewall** (for testing only):
|
||||
```
|
||||
sudo systemctl stop firewall
|
||||
# After testing
|
||||
sudo systemctl start firewall
|
||||
```
|
||||
Or if using `mkModule`, set `openFirewall = true` (it's the default).
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you encounter an issue not covered in this guide:
|
||||
|
||||
1. Check the NixOS Wiki: https://nixos.wiki/
|
||||
2. Search the NixOS Discourse forum: https://discourse.nixos.org/
|
||||
3. Join the NixOS Matrix/Discord community for real-time help
|
||||
4. File an issue in the repository if you believe you've found a bug
|
||||
- NixOS manual: `nixos-help` or https://nixos.org/manual/nixos/stable/
|
||||
- NixOS Wiki: https://nixos.wiki/
|
||||
- NixOS Discourse: https://discourse.nixos.org/
|
||||
- Nix package search: https://search.nixos.org/packages
|
||||
|
||||
Reference in New Issue
Block a user