It happened on a Tuesday afternoon. I was running a long Ollama inference job — the kind that normally cruises along at 40–50 tokens per second on my RTX 3090. Then, without warning, it dropped to three.
I checked nvidia-smi. There it was: FortniteClient-Win64-Shipping.exe, consuming 94% of VRAM. My son had come home from school, found the machine idle, and launched Fortnite. From his perspective, the computer was free. From mine, it was doing work — just work that didn’t look like work from his side of the screen.
This is a specific problem that almost no tooling addresses: a single powerful machine serving double duty as a homelab AI inference box and a family gaming PC. And if you’re in the r/LocalLLaMA crowd, you probably know exactly what I’m talking about.
Why This Problem Is Hard to Solve
Your first instinct might be to reach for cgroups, Windows Job Objects, or GPU time-slicing. These work at the kernel level and are technically correct. They’re also a maintenance burden that requires you to know in advance exactly which workloads will compete.
The real problem is that GPU contention isn’t deterministic. You don’t always know when you’ll be running inference. You can’t pre-configure limits for every possible scenario. And you don’t want to permanently throttle gaming if the machine is genuinely idle.
What you actually want is conditional enforcement: kill the game only when it’s competing with real work, and leave it alone when the machine is free. That requires observability first — you need to know what’s running, on which machine, and whether it’s actually contending for resources.
What Leassh Does Differently
I built openclaw-fleet, the open-source plugin for Leassh, specifically for this use case. It turns your fleet of SSH-accessible machines into an observable, controllable system — and it plugs directly into OpenClaw so your AI assistant becomes aware of what’s happening across your nodes.
The core idea is simple: instead of hard-coding rules about which processes to kill, you teach the system which processes are “gaming” and let it decide based on context. When a process is on the blacklist and the machine is idle, it gets killed. When the machine is doing real work and a gaming process appears, you get an alert and can act.
Setting It Up in Five Minutes
Start by describing your fleet in a fleet.yaml file. This is the single configuration file that Leassh reads:
That’s it for configuration. No daemon setup, no agent installation on the monitored machines (unless you want push-mode alerts). You can now query any node in your fleet with a simple SSH key that’s already set up.
The OpenClaw Integration
Once you install the openclaw-fleet plugin, your AI assistant gains direct awareness of your fleet. You can ask in natural language:
"Kill Fortnite on inference-box" → fleet_exec terminates the process, confirms
"What’s inference-box doing?" → node_detail shows active processes, GPU %, VRAM usage, logged-in user
The real power is the unknown_heavy_process event. When Leassh detects a process consuming significant GPU or CPU that it doesn’t recognize, it asks your AI assistant to classify it:
- Games — Fortnite, Rocket League, Steam, Epic Games Launcher → recommend blacklisting
- Legitimate AI workloads — Ollama, llama.cpp, ComfyUI, AUTOMATIC1111 → whitelist and leave alone
- System processes — Windows Update, Defender, backup tools → whitelist
- Unknown — surface to you for a decision
Over time, the blacklist builds itself. You approve the first Fortnite kill manually. After that, it’s automatic.
Automatic Enforcement: The Policy That Actually Works
The enforcement model I settled on after six months of running this in my own homelab:
Let it run. The machine isn’t doing anything important. No action taken.
Alert fires. You get a notification: “inference-box: GPU at 98%, FortniteClient running. Kill it?” One tap to approve.
If GPU load was already high when the game launched, enforce immediately. No prompt needed.
No restrictions. Kids can use the machine normally. The cycle repeats.
This is fundamentally different from a blanket time limit. The machine is available for gaming when it’s actually available — not on an arbitrary schedule that doesn’t account for what you’re doing.
Screenshots: When You Need to See, Not Just Know
One feature that surprised me with how useful it turned out to be: capture_screenshot. I originally added it for the family safety use case, but I use it constantly for my homelab now.
When inference-box is at 95% GPU and I’m not sure why, I can ask OpenClaw to show me the screen. Instead of SSH-ing in and running ps aux | grep -i nvidia, I just see exactly what’s happening. Half the time it’s my own job finishing. The other half it’s whatever my kids have been up to.
The Numbers, After Six Months
The agent isn’t even strictly necessary for most of this. SSH access is enough to query node state, get process lists, and execute kill commands. The agent adds push-mode alerts — so Leassh can notify you proactively rather than waiting for you to ask.
Where This Gets Interesting for Larger Fleets
If you’re running multiple GPU nodes — a common setup for serious local AI inference or distributed training — the fleet view becomes genuinely useful. You can see at a glance which nodes are loaded, which are idle, and whether any are being consumed by non-inference workloads.
I’ve heard from a few people running private GPU clusters for their team who use this to ensure that personal gaming on shared workstations doesn’t cannibalize inference capacity during work hours. The same blacklist mechanism works; the enforcement just applies across more nodes.
Getting Started
The openclaw-fleet plugin is free and open source. You need:
- SSH access to the machines you want to monitor (you almost certainly already have this)
- OpenClaw running locally
- Five minutes to write a
fleet.yaml
That’s genuinely it. No accounts, no cloud setup, no agents to install unless you want push notifications.
openclaw-fleet on GitHub — free, MIT licensed, runs anywhere you can run Node.js 20+.
If you also want the family monitoring side — weekly behavioral reports, screen time context, content safety — that’s the paid Leassh plan. But for the GPU reclamation problem specifically, the free plugin handles it completely.
After six months of running this, I haven’t had a single interrupted inference job. My son still plays Fortnite. He just does it when the machine is actually available.