I have eight machines on my home network. Two are dedicated to local LLM inference, three are shared gaming PCs my kids use, one is a Proxmox node, one is a Raspberry Pi 5 running Pi-hole and Home Assistant, and one is a NAS. I know them all intimately. I also have no idea what any of them are doing right now without SSH-ing in to check.

That’s the homelab problem in miniature: the machines you care about most are the ones you understand least in aggregate. You built them, you know their configs, and yet you have no single view of what’s running, what’s straining, or what happened last night while you were asleep.

The Prometheus Trap

The standard advice is to set up Prometheus and Grafana. And Prometheus is genuinely excellent software — I use it at work. But for a homelab with 5–10 machines, the overhead is real:

I’m not saying the stack isn’t worth it. For a production environment or a serious homelab with 20+ nodes it absolutely is. But if what you actually want is “tell me what’s happening across my machines right now,” Prometheus is a 10-hour project for a 10-second answer.

What SSH Already Gives You

Here’s the thing: you already have SSH access to every Linux and macOS machine in your lab. Windows has had a native OpenSSH server since 2019. Every metric you could want — CPU load, RAM pressure, GPU utilization, disk usage, running processes, logged-in users, temperatures — is available via a command over SSH.

The question is whether you want to run those commands manually every time, or whether you want something that does it continuously, correlates the results, and surfaces them when something interesting happens.

The SSH Monitoring Model

One monitoring process holds a persistent SSH connection to each machine. Every minute or so, it runs a set of OS-appropriate commands, parses the output, and maintains state. No agents installed on target machines. No open ports beyond SSH. The data never leaves your network.

This is the architecture I built into Leassh’s fleet monitoring plugin, and it’s fundamentally different from the exporter-based model: the intelligence is on the monitoring side, not the monitored side. Your machines don’t know they’re being monitored. They’re just answering SSH commands.

What It Looks Like in Practice

Once configured, you get two things. First, a natural language interface via OpenClaw — your AI agent can answer questions about your fleet in real time:

you > how's the fleet?
 
gpu-primary ONLINE CPU 12% GPU 8% RAM 18GB/24GB disk 847GB free
gpu-secondary ONLINE CPU 6% GPU 0% RAM 6GB/16GB disk 1.2TB free
nas ONLINE CPU 3% disk WARNING: 89% full
proxmox ONLINE CPU 31% RAM 28GB/64GB 4 VMs active
pi5 ONLINE CPU 4% RAM 2.1GB/8GB
gaming-1 IDLE (no users, 43min)
gaming-2 IDLE (no users, 2h 11min)
gaming-3 ONLINE user: felix, GPU 91% (Fortnite)
 
# nas disk warning flagged — you've got ~3 days at current write rate

That’s one query, no dashboard to build, no alert to configure beforehand. The NAS disk warning surfaces because the system is tracking disk usage over time and doing rate-of-change math — “time to full based on current write rate” is more useful than “89% used.”

Second, there’s a live dashboard served at /fleet — all nodes, status bars, GPU VRAM, disk, active users, last seen. It auto-refreshes. For the big-picture view it’s faster than any terminal.

The Configuration Is One File

The entire setup is a single fleet.yaml:

license_key: "your-license-key" nodes: - name: gpu-primary host: 192.168.1.100 ssh: carl@192.168.1.100 os: linux - name: nas host: 192.168.1.105 ssh: admin@192.168.1.105 os: linux - name: proxmox host: 192.168.1.110 ssh: root@192.168.1.110 os: linux - name: gaming-3 host: 192.168.1.131 ssh: carl@192.168.1.131 os: windows probes: health_interval: 60 # seconds metrics_interval: 120 idle_threshold: 30 # minutes before marking IDLE load_thresholds: low: 30 high: 70

Add a node, restart the binary, it starts showing up. Remove a node, it disappears. No service discovery config, no scrape rules, no relabeling.

Cross-Platform Without the Pain

The part that took the most work to build is the part you don’t see: getting the same logical metric from machines that answer completely different commands.

Metric Linux macOS Windows
CPU usage /proc/stat top -l 1 Get-CimInstance
RAM /proc/meminfo vm_stat Get-CimInstance
Disk df -h df -h Get-CimInstance
GPU (NVIDIA) nvidia-smi N/A nvidia-smi
Idle time xprintidle / input mtime ioreg HIDIdleTime GetLastInputInfo
Processes ps aux ps aux Get-Process
Screenshots scrot / grim screencapture PowerShell task¹

¹ Windows SSH runs in Session 0 (no desktop). Screenshots require creating a scheduled task in the interactive user session — the binary handles this automatically.

All of these commands live in a JSON registry, not hardcoded in the binary. If a command isn’t available on a given machine, that metric is skipped gracefully. Add a new command variant without recompiling.

Alerts That Don’t Require Pre-Configuration

Traditional monitoring is reactive: you define a threshold, something crosses it, you get an alert. That’s fine when you know what to watch for. But homelab failures are creative — it’s rarely the alert you configured that fires, it’s the one you forgot to set up.

SSH fleet monitoring has an advantage here because the monitoring system understands behavior, not just thresholds. When a previously-idle machine suddenly has a high-GPU process running, that’s surfaced without you pre-defining “alert if GPU > 80%.” The agent sees the state change and tells you about it.

Concrete examples of what gets flagged automatically:

Where It Fits in the Monitoring Stack

I still run Prometheus at work and wouldn’t replace it for production systems. For my homelab, SSH fleet monitoring does what I actually need:

“Tell me what my machines are doing right now, flag anything unusual, and let me ask questions without opening five terminal tabs.”

If you’re running fewer than 20 nodes and spending more time maintaining your monitoring stack than actually using it, that’s a signal to simplify. SSH is already there. The commands already work. The only question is whether you’re running them manually every time or whether something runs them for you.

The fleet plugin is free for OpenClaw users — unlimited nodes, MIT licensed. The full Leassh product adds rules automation (“kill this process on idle machines”), screen time enforcement, and AI behavioral reports for family machines. But for pure fleet monitoring, the free tier is the whole thing.

Related reading: If you’re also dealing with kids gaming on your GPU machines, see How to Stop Your Kids From Hijacking Your Homelab GPU for the automated enforcement side of the same setup.