Tech Log Entry — Homelab Infrastructure Validation
Tech Log Entry — Homelab Infrastructure Validation
(or, Automated Backups Confirmed, WHEA Hardware Issue Resolved, & Log Rotation Configured)
Background and Context
This entry documents the validation phase following implementation of a three-machine automated backup infrastructure (documented in the previous entry) for my homelab LAN. It also covers the investigation and resolution of a hardware error pattern on the Windows desktop, a discussion of Linux full-disk encryption, and the setup of log rotation on both Linux laptops.
All three machines — one Windows 11 desktop ([desktop]) and two Linux Mint laptops ([laptop #1] and [laptop #2]) — are connected via a dedicated gigabit LAN switch on an isolated local subnet, separate from the home Wi-Fi network.
Part 1: First Automated Backup Runs — Confirmed Successful
FreeFileSync on [desktop] (Windows 11):
Task ran at 2:33 AM as scheduled
FreeFileSync log showed: Completed successfully, 1 minute 31 seconds, 3.05 MB/sec
1 warning: database file not available for one folder pair — FFS defaulted to configured sync direction and proceeded correctly; not a recurring concern
2 files moved to Recycle Bin: correct behavior — originals had been deleted by the user; FFS mirrored the deletion to the backup destination, using Recycle Bin rather than hard-delete, preserving a recovery window
Task Scheduler showed 0x41306 result code — confirmed as a known false negative from wake-from-sleep interaction; FFS log is the authoritative source of truth
rsync on [laptop #1] (Linux Mint):
Cron job fired at 8:00 AM after user opened and unlocked laptop at ~7:20 AM
rsync-backup.log confirmed: sent 1.3 GB, speedup 19.23x — correct incremental behavior
Session files (.xsession-errors, .Xauthority) created at login time (~7:20 AM) were picked up and transferred at 8:00 AM — expected and correct
sshfs mount icon confirmed present on desktop at login
rsync on [laptop #2] (Linux Mint):
Cron job fired at 8:00 AM
rsync-backup.log confirmed run; session files transferred
Two new symlink errors identified: .Private and .ecryptfs — artifacts of Linux Mint's optional home encryption feature; not present on [laptop #2] but referenced in system symlinks
Fix: added --exclude='.Private' and --exclude='.ecryptfs' to [laptop #2]'s backup script
Key monitoring lesson established:
Primary check: rsync-backup.log on each Linux machine — look for sent/received/total size summary lines
Secondary check: FreeFileSync HTML log on [desktop] — look for "Completed successfully"
Task Scheduler result code is unreliable in wake-from-sleep scenarios; do not use as primary indicator
CompTIA Network+ study note: Automated backup verification, log analysis, and distinguishing between transport-layer success (data transferred) and application-layer reporting (exit codes) are relevant to Network Operations — monitoring tools, log management, and performance baselines.
Part 2: WHEA Hardware Error Investigation and Resolution
During routine Event Viewer monitoring, a pattern of WHEA-Logger Warning events (Event ID 17) was identified on [desktop]. Investigation and resolution spanned several days.
Discovery:
Filtered Event Viewer → Windows Logs → System for WHEA-Logger warnings
Found 8,660 entries spanning approximately 3 months, with an accelerating rate:
April 24: 142 entries
April 25: 174 entries
April 26: 354 entries — clear upward trend
What WHEA-Logger Event ID 17 means:
WHEA = Windows Hardware Error Architecture
"Corrected hardware error" on PCI Express Root Port
Hardware detected and self-corrected PCIe bus errors — no crash or data loss, but an elevated and increasing rate is a warning sign
Root cause identified:
Several possibilities were discussed and rejected after discussion or investigation
Cause: GPU (NVIDIA RTX 3060) installed mid-January, 2026
NVIDIA driver version 591.86 installed January 20, 2026
WHEA errors began January 26, 2026 — six days after driver installation
Strong correlation between driver update and error onset
Research showed driver 591.86 has known issues with PCIe link state management on RTX 30-series cards, generating correctable errors during idle power state transitions
Contributing factor:
PCIe Link State Power Management was enabled in Windows power settings — Windows was aggressively switching the PCIe link to low-power states during idle, triggering the correctable errors
Resolution — two actions taken:
Updated NVIDIA driver from 591.86 to 596.21 (GeForce Game Ready Driver, April 16, 2026, WHQL certified)
Disabled PCIe Link State Power Management: Settings → Edit Power Plan → Change Advanced Power Settings → PCI Express → Link State Power Management → Off
Results after fix:
Day 1 after fix: 4 WHEA errors — all timestamped before the fix was applied; zero new errors
Day 2: 0 errors in 24 hours, including during 2 hours of GPU-intensive gaming
Day 3: 0 errors in 24 hours
7-day count dropping steadily as historical entries age out of the window
Diagnosis tools used:
Windows Event Viewer (filter by source: WHEA-Logger, level: Warning)
GPU-Z (GPU temperature, PCIe link width/speed, driver version)
HWiNFO64 (hardware sensor monitoring)
Additional context — power history:
Machine had experienced 5-10 power spikes and drops before APC UPS installation 4 months prior, so initially this was considered as a cause
No physical hardware damage found — issue was entirely driver and power management related
Physical GPU reseating (prepared as a contingency) was not required
CompTIA Network+ study note: Hardware error investigation using system logs, correlating software changes (driver updates) with hardware behavior, and PCIe as a hardware interface are covered under Network Troubleshooting — troubleshooting methodology, using appropriate tools, and identifying root cause vs symptoms. The structured troubleshooting approach used here — discover, correlate, hypothesize, fix, verify — directly mirrors the CompTIA troubleshooting methodology.
Part 3: Linux Full-Disk Encryption Discussion
Current state:
[laptop #1] (portable academic/personal machine): LUKS full-disk encryption enabled at installation — decryption password required at every boot before the user login screen
[laptop #2] (stationary academic/professional machine): no encryption; not planned
Question:
Should [laptop #2] be put on LUKS full-disk encryption also?
LUKS encryption — advantages and disadvantages:
Advantages:
Full protection if device is lost or stolen — data is unreadable without the decryption key
Transparent to all software above the filesystem layer — applications including Claude Code are unaware of it
Particularly valuable for portable machines that leave the home network
Disadvantages and considerations:
Physical presence required at every boot to enter decryption password — automated remote boot is not possible without additional configuration
If decryption password is lost, all data is permanently unrecoverable — no bypass exists by design
Minor performance overhead — negligible on modern hardware with AES hardware acceleration
Slightly longer boot time
Decision on [laptop #2]: Encryption not implemented. Reasoning: [laptop #2] is designated as a stationary at-home machine; [laptop #1] covers portable use cases. The current security posture for [laptop #2] (isolated LAN access, no persistent shared folders, restricted SSH service account, ClamAV, isolation policy for Claude Code) is appropriate for a non-portable machine. Adding encryption retroactively would require a full reinstall; if travel use cases develop in future, this decision can be revisited at that time.
Impact on Claude Code: None. LUKS encryption operates below the filesystem layer. Once the machine is booted and decrypted, Claude Code and all other applications function identically to an unencrypted system.
CompTIA Network+ study note: Encryption at rest, the distinction between encryption in transit (covered by SSH/sshfs) and encryption at rest (LUKS), and security trade-offs in system design are covered under Network Security — data security concepts, encryption standards, and security policies.
Part 4: Log Rotation — Both Linux Machines
Purpose: The rsync backup script appends its output to ~/rsync-backup.log on each run. Without rotation, this file grows indefinitely. Logrotate automates periodic rotation, compression, and deletion of old logs.
Tool: logrotate 3.21.0 (pre-installed on Linux Mint)
Configuration file created on each machine: Path: /etc/logrotate.d/rsync-backup
/home/[username]/rsync-backup.log {
monthly
rotate 6
compress
missingok
notifempty
copytruncate
}
Option explanations:
monthly — rotate once per month
rotate 6 — retain 6 months of compressed archives before deletion
compress — compress rotated logs with gzip (text logs compress ~10:1)
missingok — no error if log file doesn't exist yet
notifempty — skip rotation if log is empty
copytruncate — copy log contents then truncate original in place, rather than moving the file; required because the cron job has an open file handle to the log path
Tested via:
sudo logrotate --debug /etc/logrotate.d/rsync-backup
Debug output confirmed: configuration read correctly, log file found, rotation not needed (file created today — first rotation will occur June 1, 2026; calendar entry set up to check on this).
Logrotate runs automatically as part of Linux's existing daily cron infrastructure — no additional scheduling required.
CompTIA Network+ study note: Log management, retention policies, and storage considerations for monitoring data are covered under Network Operations — documentation, policies, and network monitoring. Understanding why logs are rotated (storage management, audit trail maintenance) and how retention periods are chosen reflects real-world network operations practice.
Daily Monitoring Routine Established
As a result of this validation phase, a morning checklist was established:
FreeFileSync log ([desktop]): confirm "Completed successfully" in the most recent HTML log file
rsync-backup.log ([laptop #1] and [laptop #2]): confirm sent/received/total size summary lines present
sshfs mount icon: confirm present on both Linux laptop desktops at login
Task Scheduler ([desktop]): note Last Run Time; disregard 0x41306 result code
Event Viewer WHEA-Logger ([desktop]): confirm Last Hour and 24 Hours counts remain at zero or near-zero
Coffee Canister: periodic spot-check of folder modification timestamps
Calendar reminders set at decreasing rates for:
Daily checks (current phase)
Weekly checks (after 3 consecutive clean daily results)
Monthly checks (long term steady state)
Biweekly Linux laptop reboots
Watch Out For (Future)
rsync-backup.log will be rotated monthly — after rotation, the active log restarts from empty; this is correct behavior, not a sign of failure
Compressed old logs will appear as rsync-backup.log.1.gz, rsync-backup.log.2.gz, etc. — readable with zcat or zless
WHEA-Logger 7-day count will continue to decline over the next week as pre-fix historical entries age out of the window — this is expected and not a sign of new errors
If WHEA errors resume correlating with GPU-intensive activity, physical GPU reseating is the next diagnostic step
[laptop #2]'s .Private and .ecryptfs exclusions added to backup script — if Linux Mint adds other encryption-related symlinks in future updates, similar errors may appear and require similar exclusions
Lessons Learned
Validate automated systems by checking their own logs, not just the scheduler that launched them. Task Scheduler's exit code and FreeFileSync's own completion log can disagree — the application log is more reliable. Ultimately, eyeball checking for expected artifacts and results (when possible) is most reliable.
Hardware error investigation benefits from timeline correlation. The six-day gap between a driver install and the onset of WHEA errors was only visible because Event Viewer filters were used to find the earliest occurrence. Without knowing when errors started, the driver correlation identification would have been harder and more time-consuming.
Establish a monitoring routine before considering infrastructure complete. Knowing what "normal" looks like — [laptop #2]l transfer sizes, expected log entries, baseline error rates — makes anomalies visible when they occur. Practice a gradual release of monitoring and increase of trust, but revisit in-depth every 6-12 months, as best.
Log rotation is infrastructure, not housekeeping. An unrotated log that grows for months can cause disk space issues silently. Setting up rotation at the same time as the log-generating process is the right practice.
Security decisions should be proportional to actual risk. Full-disk encryption on a portable machine is high-value; on a stationary at-home machine with other security controls in place, it is lower priority. Matching security controls to actual threat models is more effective than applying maximum security everywhere.
Next Steps / To-Do
Install Bitwarden Firefox extensions on [laptop #1] and [laptop #2]
Set up VPN on [laptop #1] and [laptop #2] (WireGuard-based; travel deadline in ~3 weeks)
Configure Samba file sharing on LAN for interactive file transfers between machines
Begin Claude Code introductory projects on [laptop #2]
Evaluate and uninstall CUDA toolkit on [desktop]; update FreeFileSync exclusions
Address C: drive storage on [desktop] (88% full)
Complete reused-password cleanup in Bitwarden
Comments
Post a Comment