Tech Log Entry — Homelab Infrastructure Validation

Tech Log Entry — Homelab Infrastructure Validation

(or, Automated Backups Confirmed, WHEA Hardware Issue Resolved, & Log Rotation Configured)

Background and Context

This entry documents the validation phase following implementation of a three-machine automated backup infrastructure (documented in the previous entry) for my homelab LAN. It also covers the investigation and resolution of a hardware error pattern on the Windows desktop, a discussion of Linux full-disk encryption, and the setup of log rotation on both Linux laptops.

All three machines — one Windows 11 desktop ([desktop]) and two Linux Mint laptops ([laptop #1] and [laptop #2]) — are connected via a dedicated gigabit LAN switch on an isolated local subnet, separate from the home Wi-Fi network.


Part 1: First Automated Backup Runs — Confirmed Successful

FreeFileSync on [desktop] (Windows 11):

  • Task ran at 2:33 AM as scheduled

  • FreeFileSync log showed: Completed successfully, 1 minute 31 seconds, 3.05 MB/sec

  • 1 warning: database file not available for one folder pair — FFS defaulted to configured sync direction and proceeded correctly; not a recurring concern

  • 2 files moved to Recycle Bin: correct behavior — originals had been deleted by the user; FFS mirrored the deletion to the backup destination, using Recycle Bin rather than hard-delete, preserving a recovery window

  • Task Scheduler showed 0x41306 result code — confirmed as a known false negative from wake-from-sleep interaction; FFS log is the authoritative source of truth

rsync on [laptop #1] (Linux Mint):

  • Cron job fired at 8:00 AM after user opened and unlocked laptop at ~7:20 AM

  • rsync-backup.log confirmed: sent 1.3 GB, speedup 19.23x — correct incremental behavior

  • Session files (.xsession-errors, .Xauthority) created at login time (~7:20 AM) were picked up and transferred at 8:00 AM — expected and correct

  • sshfs mount icon confirmed present on desktop at login

rsync on [laptop #2] (Linux Mint):

  • Cron job fired at 8:00 AM

  • rsync-backup.log confirmed run; session files transferred

  • Two new symlink errors identified: .Private and .ecryptfs — artifacts of Linux Mint's optional home encryption feature; not present on [laptop #2] but referenced in system symlinks

  • Fix: added --exclude='.Private' and --exclude='.ecryptfs' to [laptop #2]'s backup script

Key monitoring lesson established:

  • Primary check: rsync-backup.log on each Linux machine — look for sent/received/total size summary lines

  • Secondary check: FreeFileSync HTML log on [desktop] — look for "Completed successfully"

  • Task Scheduler result code is unreliable in wake-from-sleep scenarios; do not use as primary indicator

CompTIA Network+ study note: Automated backup verification, log analysis, and distinguishing between transport-layer success (data transferred) and application-layer reporting (exit codes) are relevant to Network Operations — monitoring tools, log management, and performance baselines.


Part 2: WHEA Hardware Error Investigation and Resolution

During routine Event Viewer monitoring, a pattern of WHEA-Logger Warning events (Event ID 17) was identified on [desktop]. Investigation and resolution spanned several days.

Discovery:

  • Filtered Event Viewer → Windows Logs → System for WHEA-Logger warnings

  • Found 8,660 entries spanning approximately 3 months, with an accelerating rate:

    • April 24: 142 entries

    • April 25: 174 entries

    • April 26: 354 entries — clear upward trend

What WHEA-Logger Event ID 17 means:

  • WHEA = Windows Hardware Error Architecture

  • "Corrected hardware error" on PCI Express Root Port

  • Hardware detected and self-corrected PCIe bus errors — no crash or data loss, but an elevated and increasing rate is a warning sign

Root cause identified:

  • Several possibilities were discussed and rejected after discussion or investigation

  • Cause: GPU (NVIDIA RTX 3060) installed mid-January, 2026

  • NVIDIA driver version 591.86 installed January 20, 2026

  • WHEA errors began January 26, 2026 — six days after driver installation

  • Strong correlation between driver update and error onset

  • Research showed driver 591.86 has known issues with PCIe link state management on RTX 30-series cards, generating correctable errors during idle power state transitions

Contributing factor:

  • PCIe Link State Power Management was enabled in Windows power settings — Windows was aggressively switching the PCIe link to low-power states during idle, triggering the correctable errors

Resolution — two actions taken:

  1. Updated NVIDIA driver from 591.86 to 596.21 (GeForce Game Ready Driver, April 16, 2026, WHQL certified)

  2. Disabled PCIe Link State Power Management: Settings → Edit Power Plan → Change Advanced Power Settings → PCI Express → Link State Power Management → Off

Results after fix:

  • Day 1 after fix: 4 WHEA errors — all timestamped before the fix was applied; zero new errors

  • Day 2: 0 errors in 24 hours, including during 2 hours of GPU-intensive gaming

  • Day 3: 0 errors in 24 hours

  • 7-day count dropping steadily as historical entries age out of the window

Diagnosis tools used:

  • Windows Event Viewer (filter by source: WHEA-Logger, level: Warning)

  • GPU-Z (GPU temperature, PCIe link width/speed, driver version)

  • HWiNFO64 (hardware sensor monitoring)

Additional context — power history:

  • Machine had experienced 5-10 power spikes and drops before APC UPS installation 4 months prior, so initially this was considered as a cause

  • No physical hardware damage found — issue was entirely driver and power management related

  • Physical GPU reseating (prepared as a contingency) was not required

CompTIA Network+ study note: Hardware error investigation using system logs, correlating software changes (driver updates) with hardware behavior, and PCIe as a hardware interface are covered under Network Troubleshooting — troubleshooting methodology, using appropriate tools, and identifying root cause vs symptoms. The structured troubleshooting approach used here — discover, correlate, hypothesize, fix, verify — directly mirrors the CompTIA troubleshooting methodology.


Part 3: Linux Full-Disk Encryption Discussion

Current state:

  • [laptop #1] (portable academic/personal machine): LUKS full-disk encryption enabled at installation — decryption password required at every boot before the user login screen

  • [laptop #2] (stationary academic/professional machine): no encryption; not planned

Question: 

  •  Should [laptop #2] be put on LUKS full-disk encryption also?

LUKS encryption — advantages and disadvantages:

Advantages:

  • Full protection if device is lost or stolen — data is unreadable without the decryption key

  • Transparent to all software above the filesystem layer — applications including Claude Code are unaware of it

  • Particularly valuable for portable machines that leave the home network

Disadvantages and considerations:

  • Physical presence required at every boot to enter decryption password — automated remote boot is not possible without additional configuration

  • If decryption password is lost, all data is permanently unrecoverable — no bypass exists by design

  • Minor performance overhead — negligible on modern hardware with AES hardware acceleration

  • Slightly longer boot time

Decision on [laptop #2]: Encryption not implemented. Reasoning: [laptop #2] is designated as a stationary at-home machine; [laptop #1] covers portable use cases. The current security posture for [laptop #2] (isolated LAN access, no persistent shared folders, restricted SSH service account, ClamAV, isolation policy for Claude Code) is appropriate for a non-portable machine. Adding encryption retroactively would require a full reinstall; if travel use cases develop in future, this decision can be revisited at that time.

Impact on Claude Code: None. LUKS encryption operates below the filesystem layer. Once the machine is booted and decrypted, Claude Code and all other applications function identically to an unencrypted system.

CompTIA Network+ study note: Encryption at rest, the distinction between encryption in transit (covered by SSH/sshfs) and encryption at rest (LUKS), and security trade-offs in system design are covered under Network Security — data security concepts, encryption standards, and security policies.


Part 4: Log Rotation — Both Linux Machines

Purpose: The rsync backup script appends its output to ~/rsync-backup.log on each run. Without rotation, this file grows indefinitely. Logrotate automates periodic rotation, compression, and deletion of old logs.

Tool: logrotate 3.21.0 (pre-installed on Linux Mint)

Configuration file created on each machine: Path: /etc/logrotate.d/rsync-backup

/home/[username]/rsync-backup.log {

    monthly

    rotate 6

    compress

    missingok

    notifempty

    copytruncate

}


Option explanations:

  • monthly — rotate once per month

  • rotate 6 — retain 6 months of compressed archives before deletion

  • compress — compress rotated logs with gzip (text logs compress ~10:1)

  • missingok — no error if log file doesn't exist yet

  • notifempty — skip rotation if log is empty

  • copytruncate — copy log contents then truncate original in place, rather than moving the file; required because the cron job has an open file handle to the log path

Tested via:

sudo logrotate --debug /etc/logrotate.d/rsync-backup


Debug output confirmed: configuration read correctly, log file found, rotation not needed (file created today — first rotation will occur June 1, 2026; calendar entry set up to check on this).

Logrotate runs automatically as part of Linux's existing daily cron infrastructure — no additional scheduling required.

CompTIA Network+ study note: Log management, retention policies, and storage considerations for monitoring data are covered under Network Operations — documentation, policies, and network monitoring. Understanding why logs are rotated (storage management, audit trail maintenance) and how retention periods are chosen reflects real-world network operations practice.


Daily Monitoring Routine Established

As a result of this validation phase, a morning checklist was established:

  • FreeFileSync log ([desktop]): confirm "Completed successfully" in the most recent HTML log file

  • rsync-backup.log ([laptop #1] and [laptop #2]): confirm sent/received/total size summary lines present

  • sshfs mount icon: confirm present on both Linux laptop desktops at login

  • Task Scheduler ([desktop]): note Last Run Time; disregard 0x41306 result code

  • Event Viewer WHEA-Logger ([desktop]): confirm Last Hour and 24 Hours counts remain at zero or near-zero

  • Coffee Canister: periodic spot-check of folder modification timestamps

Calendar reminders set at decreasing rates for:

  • Daily checks (current phase)

  • Weekly checks (after 3 consecutive clean daily results)

  • Monthly checks (long term steady state)

  • Biweekly Linux laptop reboots


Watch Out For (Future)

  • rsync-backup.log will be rotated monthly — after rotation, the active log restarts from empty; this is correct behavior, not a sign of failure

  • Compressed old logs will appear as rsync-backup.log.1.gz, rsync-backup.log.2.gz, etc. — readable with zcat or zless

  • WHEA-Logger 7-day count will continue to decline over the next week as pre-fix historical entries age out of the window — this is expected and not a sign of new errors

  • If WHEA errors resume correlating with GPU-intensive activity, physical GPU reseating is the next diagnostic step

  • [laptop #2]'s .Private and .ecryptfs exclusions added to backup script — if Linux Mint adds other encryption-related symlinks in future updates, similar errors may appear and require similar exclusions


Lessons Learned

  • Validate automated systems by checking their own logs, not just the scheduler that launched them. Task Scheduler's exit code and FreeFileSync's own completion log can disagree — the application log is more reliable.  Ultimately, eyeball checking for expected artifacts and results (when possible) is most reliable. 

  • Hardware error investigation benefits from timeline correlation. The six-day gap between a driver install and the onset of WHEA errors was only visible because Event Viewer filters were used to find the earliest occurrence. Without knowing when errors started, the driver correlation identification would have been harder and more time-consuming.

  • Establish a monitoring routine before considering infrastructure complete. Knowing what "normal" looks like — [laptop #2]l transfer sizes, expected log entries, baseline error rates — makes anomalies visible when they occur.  Practice a gradual release of monitoring and increase of trust, but revisit in-depth every 6-12 months, as best.

  • Log rotation is infrastructure, not housekeeping. An unrotated log that grows for months can cause disk space issues silently. Setting up rotation at the same time as the log-generating process is the right practice.

  • Security decisions should be proportional to actual risk. Full-disk encryption on a portable machine is high-value; on a stationary at-home machine with other security controls in place, it is lower priority. Matching security controls to actual threat models is more effective than applying maximum security everywhere.


Next Steps / To-Do

  • Install Bitwarden Firefox extensions on [laptop #1] and [laptop #2]

  • Set up VPN on [laptop #1] and [laptop #2] (WireGuard-based; travel deadline in ~3 weeks)

  • Configure Samba file sharing on LAN for interactive file transfers between machines

  • Begin Claude Code introductory projects on [laptop #2]

  • Evaluate and uninstall CUDA toolkit on [desktop]; update FreeFileSync exclusions

  • Address C: drive storage on [desktop] (88% full)

  • Complete reused-password cleanup in Bitwarden


Comments

Popular posts from this blog

WWHD?

Telling Rocks What To Think

Byting Off More Than You Can Chew