It happened on a Tuesday afternoon. I was running a long Ollama inference job — the kind that normally cruises along at 40–50 tokens per second on my RTX 3090. Then, without warning, it dropped to three.

I checked nvidia-smi. There it was: FortniteClient-Win64-Shipping.exe, consuming 94% of VRAM. My son had come home from school, found the machine idle, and launched Fortnite. From his perspective, the computer was free. From mine, it was doing work — just work that didn’t look like work from his side of the screen.

This is a specific problem that almost no tooling addresses: a single powerful machine serving double duty as a homelab AI inference box and a family gaming PC. And if you’re in the r/LocalLLaMA crowd, you probably know exactly what I’m talking about.

Why This Problem Is Hard to Solve

Your first instinct might be to reach for cgroups, Windows Job Objects, or GPU time-slicing. These work at the kernel level and are technically correct. They’re also a maintenance burden that requires you to know in advance exactly which workloads will compete.

The real problem is that GPU contention isn’t deterministic. You don’t always know when you’ll be running inference. You can’t pre-configure limits for every possible scenario. And you don’t want to permanently throttle gaming if the machine is genuinely idle.

What you actually want is conditional enforcement: kill the game only when it’s competing with real work, and leave it alone when the machine is free. That requires observability first — you need to know what’s running, on which machine, and whether it’s actually contending for resources.

What Leassh Does Differently

I built openclaw-fleet, the open-source plugin for Leassh, specifically for this use case. It turns your fleet of SSH-accessible machines into an observable, controllable system — and it plugs directly into OpenClaw so your AI assistant becomes aware of what’s happening across your nodes.

The core idea is simple: instead of hard-coding rules about which processes to kill, you teach the system which processes are “gaming” and let it decide based on context. When a process is on the blacklist and the machine is idle, it gets killed. When the machine is doing real work and a gaming process appears, you get an alert and can act.

No agents required for basic fleet visibility. Leassh uses SSH — if the machine is SSH-accessible, you can query it. For push-mode alerts and automatic enforcement, a tiny agent binary is available (4MB, no dependencies).

Setting It Up in Five Minutes

Start by describing your fleet in a fleet.yaml file. This is the single configuration file that Leassh reads:

# fleet.yaml nodes: - name: inference-box host: 192.168.1.50 user: carl key: ~/.ssh/id_ed25519 tags: [gpu, ai] blacklist: - FortniteClient-Win64-Shipping.exe - RocketLeague.exe - Minecraft.exe - steam.exe - name: kids-desktop host: 192.168.1.75 user: oscar key: ~/.ssh/id_ed25519 tags: [family]

That’s it for configuration. No daemon setup, no agent installation on the monitored machines (unless you want push-mode alerts). You can now query any node in your fleet with a simple SSH key that’s already set up.

The OpenClaw Integration

Once you install the openclaw-fleet plugin, your AI assistant gains direct awareness of your fleet. You can ask in natural language:

💬
"Is anyone gaming right now?" → fleet_status, filter for blacklisted processes with high GPU usage

"Kill Fortnite on inference-box" → fleet_exec terminates the process, confirms

"What’s inference-box doing?" → node_detail shows active processes, GPU %, VRAM usage, logged-in user

The real power is the unknown_heavy_process event. When Leassh detects a process consuming significant GPU or CPU that it doesn’t recognize, it asks your AI assistant to classify it:

Over time, the blacklist builds itself. You approve the first Fortnite kill manually. After that, it’s automatic.

Automatic Enforcement: The Policy That Actually Works

The enforcement model I settled on after six months of running this in my own homelab:

Machine is idle, gaming process starts

Let it run. The machine isn’t doing anything important. No action taken.

Inference job starts while game is running

Alert fires. You get a notification: “inference-box: GPU at 98%, FortniteClient running. Kill it?” One tap to approve.

Game starts while inference is already running

If GPU load was already high when the game launched, enforce immediately. No prompt needed.

Inference finishes, machine is idle again

No restrictions. Kids can use the machine normally. The cycle repeats.

This is fundamentally different from a blanket time limit. The machine is available for gaming when it’s actually available — not on an arbitrary schedule that doesn’t account for what you’re doing.

Screenshots: When You Need to See, Not Just Know

One feature that surprised me with how useful it turned out to be: capture_screenshot. I originally added it for the family safety use case, but I use it constantly for my homelab now.

When inference-box is at 95% GPU and I’m not sure why, I can ask OpenClaw to show me the screen. Instead of SSH-ing in and running ps aux | grep -i nvidia, I just see exactly what’s happening. Half the time it’s my own job finishing. The other half it’s whatever my kids have been up to.

The Numbers, After Six Months

0
Interrupted inference jobs since I set this up
~5min
Setup time, including fleet.yaml and plugin install
4MB
Agent binary size, no runtime dependencies

The agent isn’t even strictly necessary for most of this. SSH access is enough to query node state, get process lists, and execute kill commands. The agent adds push-mode alerts — so Leassh can notify you proactively rather than waiting for you to ask.

Where This Gets Interesting for Larger Fleets

If you’re running multiple GPU nodes — a common setup for serious local AI inference or distributed training — the fleet view becomes genuinely useful. You can see at a glance which nodes are loaded, which are idle, and whether any are being consumed by non-inference workloads.

I’ve heard from a few people running private GPU clusters for their team who use this to ensure that personal gaming on shared workstations doesn’t cannibalize inference capacity during work hours. The same blacklist mechanism works; the enforcement just applies across more nodes.

Privacy note: Everything runs locally. Fleet data never leaves your network. The openclaw-fleet plugin communicates only between your OpenClaw instance and your own machines over SSH. No telemetry, no cloud dependency for fleet monitoring.

Getting Started

The openclaw-fleet plugin is free and open source. You need:

That’s genuinely it. No accounts, no cloud setup, no agents to install unless you want push notifications.

openclaw-fleet on GitHub — free, MIT licensed, runs anywhere you can run Node.js 20+.

If you also want the family monitoring side — weekly behavioral reports, screen time context, content safety — that’s the paid Leassh plan. But for the GPU reclamation problem specifically, the free plugin handles it completely.

After six months of running this, I haven’t had a single interrupted inference job. My son still plays Fortnite. He just does it when the machine is actually available.