Deployable YAML configs for power users. Pick a use case, copy the template, go.
Reclaim your GPUs from gaming for Ollama inference workloads. Schedule-based priority switching.
nodes:
- name: gpu-workstation-1
host: 192.168.1.50
ssh: user@192.168.1.50
gpu: nvidia
shared: true
blacklist:
global:
- steam.exe
- epicgameslauncher.exe
rules:
- name: work_hours_gpu_priority
when: "time >= 09:00 AND time < 18:00 AND day_of_week IN [Mon,Tue,Wed,Thu,Fri]"
actions:
- kill_blacklisted
- ssh_command: "systemctl start ollama"
- name: evening_free
when: "time >= 18:00"
actions:
- ssh_command: "systemctl stop ollama"
Detect crashed services and restart them automatically with Telegram alerts.
nodes:
- name: homelab-server
host: 192.168.1.10
ssh: admin@192.168.1.10
rules:
- name: plex_recovery
when: "NOT process_running('plexmediaserver')"
nodes: [homelab-server]
actions:
- ssh_command: "systemctl restart plexmediaserver"
- notify: warning
- name: nextcloud_recovery
when: "NOT process_running('apache2')"
nodes: [homelab-server]
actions:
- ssh_command: "systemctl restart apache2"
- notify: warning
notifications:
telegram:
bot_token: "your-bot-token"
chat_id: "your-chat-id"
Auto-kill heavy apps left running on shared workstations after idle timeout.
nodes:
- name: shared-workstation
host: 192.168.1.30
ssh: admin@192.168.1.30
shared: true
rules:
- name: idle_cleanup
when: "idle_minutes > 30"
nodes: [shared-workstation]
actions:
- ssh_command: "pkill -f 'blender|chrome|firefox'"
- notify: info
probe:
idle_threshold: 600
Self-monitoring mode: see where your time actually goes with AI classification.
nodes:
- name: my-workstation
host: localhost
ssh: me@localhost
permissions:
screenshot: true
screen_time:
enabled: true
capture_interval_minutes: 10
vision:
ollama_url: http://localhost:11434
model: llava
# No enforcement, no time limits.
# Just data and weekly reports.
# GET /fleet/api/insights/my-workstation/weekly
Watch container health, restart crashed containers, and alert on resource spikes.
nodes:
- name: docker-host
host: 192.168.1.15
ssh: admin@192.168.1.15
rules:
- name: container_health
when: "ssh_output('docker ps --filter health=unhealthy -q') != ''"
nodes: [docker-host]
actions:
- ssh_command: "docker restart $(docker ps --filter health=unhealthy -q)"
- notify: warning
- name: container_down
when: "NOT process_running('containerd')"
nodes: [docker-host]
actions:
- ssh_command: "systemctl restart docker"
- notify: critical
Monitor build agents, detect overloaded nodes, and rebalance workloads.
nodes:
- name: build-agent-1
host: 192.168.1.40
ssh: ci@192.168.1.40
- name: build-agent-2
host: 192.168.1.41
ssh: ci@192.168.1.41
- name: build-agent-3
host: 192.168.1.42
ssh: ci@192.168.1.42
rules:
- name: build_agent_overload
when: "cpu_percent > 95 AND duration_minutes > 10"
actions:
- notify: warning
- webhook: build_overload
- name: build_agent_down
when: "NOT reachable"
actions:
- notify: critical
GPU temperature monitoring, hashrate tracking, and auto-shutdown on overheat.
nodes:
- name: rig-01
host: 192.168.1.60
ssh: miner@192.168.1.60
gpu: nvidia
- name: rig-02
host: 192.168.1.61
ssh: miner@192.168.1.61
gpu: nvidia
rules:
- name: gpu_overheat
when: "gpu_temp > 85"
actions:
- ssh_command: "systemctl stop miner"
- notify: critical
- name: gpu_warm
when: "gpu_temp > 75"
actions:
- notify: warning
- name: hashrate_drop
when: "gpu_utilization < 50 AND process_running('miner')"
actions:
- ssh_command: "systemctl restart miner"
- notify: info
Detect when dev servers drift from expected config. Alert on unexpected changes.
nodes:
- name: dev-server
host: 192.168.1.25
ssh: dev@192.168.1.25
rules:
- name: config_drift
when: "ssh_output('md5sum /etc/nginx/nginx.conf') != 'expected_hash'"
nodes: [dev-server]
actions:
- notify: warning
- name: unexpected_service
when: "process_running('apache2')"
nodes: [dev-server]
actions:
- notify: info
Disk space, SMART status, and RAID rebuild progress monitoring.
nodes:
- name: nas-primary
host: 192.168.1.5
ssh: admin@192.168.1.5
rules:
- name: disk_space_critical
when: "disk_used_percent > 90"
nodes: [nas-primary]
actions:
- notify: critical
- name: disk_space_warning
when: "disk_used_percent > 80"
nodes: [nas-primary]
actions:
- notify: warning
- name: smart_health
when: "ssh_output('smartctl -H /dev/sda') contains 'FAILED'"
nodes: [nas-primary]
actions:
- notify: critical
Monitor employee workstations across branch offices. Detect offline machines and resource issues.
nodes:
- name: office-nyc-01
host: 10.0.1.10
ssh: admin@10.0.1.10
tags: [nyc, workstation]
- name: office-nyc-02
host: 10.0.1.11
ssh: admin@10.0.1.11
tags: [nyc, workstation]
- name: office-lon-01
host: 10.0.2.10
ssh: admin@10.0.2.10
tags: [london, workstation]
rules:
- name: workstation_offline
when: "NOT reachable AND time >= 09:00 AND time < 18:00"
actions:
- notify: warning
- name: disk_low
when: "disk_used_percent > 85"
actions:
- notify: info
Track GPU clusters for ML training runs. Alert on job completion or hardware issues.
nodes:
- name: gpu-cluster-a
host: 192.168.1.100
ssh: ml@192.168.1.100
gpu: nvidia
- name: gpu-cluster-b
host: 192.168.1.101
ssh: ml@192.168.1.101
gpu: nvidia
rules:
- name: training_complete
when: "gpu_utilization < 5 AND previous_gpu_utilization > 80"
actions:
- notify: info
- webhook: training_done
- name: gpu_error
when: "gpu_temp > 90 OR ssh_output('nvidia-smi') contains 'ERR'"
actions:
- notify: critical
Monitor Minecraft, Valheim, Palworld, etc. servers. Auto-restart on crash.
nodes:
- name: minecraft-server
host: 192.168.1.70
ssh: game@192.168.1.70
- name: valheim-server
host: 192.168.1.71
ssh: game@192.168.1.71
rules:
- name: minecraft_crash
when: "NOT process_running('java') AND node.name == 'minecraft-server'"
actions:
- ssh_command: "cd /opt/minecraft && ./start.sh"
- notify: warning
- name: valheim_crash
when: "NOT process_running('valheim_server')"
nodes: [valheim-server]
actions:
- ssh_command: "cd /opt/valheim && ./start_server.sh"
- notify: warning
- name: high_memory
when: "mem_used_percent > 90"
actions:
- notify: critical
Monitor Plex/Jellyfin transcoding load and disk space for media libraries.
nodes:
- name: media-server
host: 192.168.1.12
ssh: admin@192.168.1.12
rules:
- name: plex_down
when: "NOT process_running('Plex Media Server')"
nodes: [media-server]
actions:
- ssh_command: "systemctl restart plexmediaserver"
- notify: warning
- name: media_disk_full
when: "disk_used_percent > 85"
nodes: [media-server]
actions:
- notify: warning
- name: transcode_overload
when: "cpu_percent > 90 AND duration_minutes > 5"
nodes: [media-server]
actions:
- notify: info
Monitor the underlying nodes, not just pods. Detect hardware issues before they cascade.
nodes:
- name: k8s-node-1
host: 192.168.1.80
ssh: admin@192.168.1.80
- name: k8s-node-2
host: 192.168.1.81
ssh: admin@192.168.1.81
- name: k8s-node-3
host: 192.168.1.82
ssh: admin@192.168.1.82
rules:
- name: kubelet_down
when: "NOT process_running('kubelet')"
actions:
- ssh_command: "systemctl restart kubelet"
- notify: critical
- name: node_disk_pressure
when: "disk_used_percent > 85"
actions:
- notify: warning
- webhook: k8s_disk_pressure
- name: node_memory_pressure
when: "mem_used_percent > 90"
actions:
- notify: warning
Monitor print queues and server health across locations.
nodes:
- name: print-server-hq
host: 192.168.1.90
ssh: admin@192.168.1.90
- name: print-server-branch
host: 10.0.2.90
ssh: admin@10.0.2.90
rules:
- name: cups_down
when: "NOT process_running('cupsd')"
actions:
- ssh_command: "systemctl restart cups"
- notify: warning
- name: print_queue_stuck
when: "ssh_output('lpstat -o | wc -l') > 20"
actions:
- notify: warning
Manage and monitor computers in a school lab. Enforce policies during class hours.
nodes:
- name: lab-pc-01
host: 192.168.10.1
ssh: admin@192.168.10.1
tags: [lab, classroom-a]
# ... repeat for each PC
blacklist:
global:
- steam.exe
- epicgameslauncher.exe
- discord.exe
rules:
- name: class_hours_enforcement
when: "time >= 08:00 AND time < 15:00 AND day_of_week IN [Mon,Tue,Wed,Thu,Fri]"
actions:
- kill_blacklisted
- name: end_of_day_shutdown
when: "time >= 17:00"
actions:
- ssh_command: "shutdown -h now"
Blender/video rendering job tracking, GPU allocation, and completion alerts.
nodes:
- name: render-node-1
host: 192.168.1.110
ssh: render@192.168.1.110
gpu: nvidia
- name: render-node-2
host: 192.168.1.111
ssh: render@192.168.1.111
gpu: nvidia
rules:
- name: render_complete
when: "NOT process_running('blender') AND previous_process_running('blender')"
actions:
- notify: info
- webhook: render_done
- name: render_gpu_overheat
when: "gpu_temp > 88"
actions:
- ssh_command: "pkill blender"
- notify: critical
- name: render_stalled
when: "gpu_utilization < 10 AND process_running('blender') AND duration_minutes > 30"
actions:
- notify: warning
Monitor OpenVPN/WireGuard servers, connection counts, and bandwidth usage.
nodes:
- name: vpn-server
host: 192.168.1.20
ssh: admin@192.168.1.20
rules:
- name: vpn_service_check
when: "NOT process_running('openvpn')"
nodes: [vpn-server]
actions:
- ssh_command: "systemctl restart openvpn"
- notify: critical
- name: high_connection_count
when: "openvpn_connections > 80"
actions:
- notify: warning
- name: bandwidth_spike
when: "network_out_mbps > 100"
actions:
- notify: info
Track backup job completion, failures, and storage usage across multiple systems.
nodes:
- name: backup-server
host: 192.168.1.30
ssh: backup@192.168.1.30
- name: nas-backup
host: 192.168.1.31
ssh: backup@192.168.1.31
rules:
- name: backup_job_failed
when: "file_exists('/tmp/backup.failed')"
actions:
- notify: critical
- ssh_command: "rm /tmp/backup.failed"
- name: backup_storage_full
when: "disk_usage_percent > 90"
actions:
- notify: warning
- name: backup_job_stalled
when: "NOT file_modified('/var/log/backup.log', hours=25)"
actions:
- notify: warning
Monitor system temperatures, power consumption, and fan speeds across your fleet.
nodes:
- name: workstation-1
host: 192.168.1.40
ssh: user@192.168.1.40
- name: server-rack
host: 192.168.1.41
ssh: admin@192.168.1.41
rules:
- name: cpu_overheat
when: "cpu_temp > 85"
actions:
- ssh_command: "cpufreq-set -g powersave"
- notify: critical
- name: fan_failure
when: "fan_rpm < 500"
actions:
- notify: critical
- name: power_spike
when: "power_draw_watts > 500"
actions:
- notify: warning
Monitor camera feeds, storage usage, and recording health for surveillance systems.
nodes:
- name: camera-nvr
host: 192.168.1.50
ssh: security@192.168.1.50
rules:
- name: camera_offline
when: "NOT process_running('frigate')"
actions:
- ssh_command: "systemctl restart frigate"
- notify: critical
- name: storage_cleanup
when: "disk_usage_percent > 85"
actions:
- ssh_command: "find /recordings -mtime +30 -delete"
- notify: info
- name: recording_gap
when: "NOT file_modified('/recordings/latest.mp4', minutes=10)"
actions:
- notify: warning
Monitor Raspberry Pi devices, sensors, and smart home controllers at scale.
nodes:
- name: pi-sensor-1
host: 192.168.1.60
ssh: pi@192.168.1.60
- name: pi-sensor-2
host: 192.168.1.61
ssh: pi@192.168.1.61
- name: home-assistant
host: 192.168.1.62
ssh: hass@192.168.1.62
rules:
- name: iot_device_offline
when: "ping_failed"
actions:
- notify: warning
- name: sensor_data_stale
when: "NOT file_modified('/tmp/sensor.json', minutes=15)"
actions:
- notify: warning
- name: low_storage_iot
when: "disk_usage_percent > 80"
actions:
- ssh_command: "journalctl --vacuum-time=7d"
- notify: info
Track MySQL/PostgreSQL performance, connection counts, and query health.
nodes:
- name: database-server
host: 192.168.1.70
ssh: db@192.168.1.70
rules:
- name: database_down
when: "NOT process_running('mysqld')"
actions:
- ssh_command: "systemctl restart mysql"
- notify: critical
- name: too_many_connections
when: "mysql_connections > 150"
actions:
- notify: warning
- name: slow_queries
when: "mysql_slow_queries > 50"
actions:
- notify: info
- name: replication_lag
when: "mysql_slave_lag > 60"
actions:
- notify: critical
Monitor multiple web servers, detect overload, and manage traffic distribution.
nodes:
- name: web-server-1
host: 192.168.1.80
ssh: www@192.168.1.80
- name: web-server-2
host: 192.168.1.81
ssh: www@192.168.1.81
- name: load-balancer
host: 192.168.1.82
ssh: nginx@192.168.1.82
rules:
- name: web_server_overload
when: "cpu_usage_percent > 90 AND duration_minutes > 5"
nodes: [web-server-1, web-server-2]
actions:
- notify: warning
- name: web_server_down
when: "NOT process_running('apache2')"
actions:
- ssh_command: "systemctl restart apache2"
- notify: critical
- name: load_balancer_config_update
when: "web_servers_available < 2"
nodes: [load-balancer]
actions:
- ssh_command: "nginx -s reload"
- notify: info
Monitor Syncthing, Nextcloud, or custom file sync processes for health and conflicts.
nodes:
- name: file-sync-server
host: 192.168.1.90
ssh: sync@192.168.1.90
rules:
- name: syncthing_down
when: "NOT process_running('syncthing')"
actions:
- ssh_command: "systemctl restart syncthing"
- notify: warning
- name: sync_conflicts
when: "file_count('/sync/*.sync-conflict*') > 0"
actions:
- notify: info
- name: sync_stalled
when: "NOT file_modified('/sync/.stfolder', hours=2)"
actions:
- notify: warning
Monitor Pi-hole, Unbound, or BIND DNS servers for performance and blocking.
nodes:
- name: dns-server
host: 192.168.1.100
ssh: dns@192.168.1.100
rules:
- name: dns_service_down
when: "NOT process_running('pihole-FTL')"
actions:
- ssh_command: "systemctl restart pihole-FTL"
- notify: critical
- name: dns_query_spike
when: "dns_queries_per_second > 1000"
actions:
- notify: warning
- name: blocklist_outdated
when: "NOT file_modified('/etc/pihole/gravity.db', days=7)"
actions:
- ssh_command: "pihole -g"
- notify: info
Monitor Redis, Memcached, or Varnish cache servers for performance and memory usage.
nodes:
- name: cache-server
host: 192.168.1.110
ssh: cache@192.168.1.110
rules:
- name: redis_down
when: "NOT process_running('redis-server')"
actions:
- ssh_command: "systemctl restart redis"
- notify: critical
- name: cache_memory_full
when: "redis_memory_usage > 90"
actions:
- ssh_command: "redis-cli FLUSHDB"
- notify: warning
- name: cache_hit_rate_low
when: "redis_hit_rate < 80"
actions:
- notify: info
Monitor Postfix, Dovecot mail servers for queue health and spam protection.
nodes:
- name: mail-server
host: 192.168.1.120
ssh: mail@192.168.1.120
rules:
- name: postfix_down
when: "NOT process_running('master')"
actions:
- ssh_command: "systemctl restart postfix"
- notify: critical
- name: mail_queue_backlog
when: "mail_queue_size > 100"
actions:
- notify: warning
- name: spam_detection_high
when: "spam_score_average > 8"
actions:
- notify: info
Monitor CPU, RAM, disk usage across your entire fleet with intelligent thresholds.
nodes:
- name: all-servers
host: "192.168.1.*"
ssh: "admin@{host}"
rules:
- name: high_cpu_sustained
when: "cpu_usage_percent > 90 AND duration_minutes > 10"
actions:
- notify: warning
- name: memory_exhaustion
when: "memory_usage_percent > 95"
actions:
- notify: critical
- name: disk_space_critical
when: "disk_usage_percent > 95"
actions:
- notify: critical
- name: load_average_high
when: "load_average_15min > cpu_cores * 2"
actions:
- notify: warning
Monitor Asterisk or FreePBX systems for call quality, registrations, and trunks.
nodes:
- name: pbx-server
host: 192.168.1.130
ssh: asterisk@192.168.1.130
rules:
- name: asterisk_down
when: "NOT process_running('asterisk')"
actions:
- ssh_command: "systemctl restart asterisk"
- notify: critical
- name: sip_registrations_low
when: "sip_registrations < 5"
actions:
- notify: warning
- name: trunk_offline
when: "trunk_status != 'OK'"
actions:
- notify: warning