Skip to content
Cloud Dashboard

Fleet Telemetry

Monitor CPU, memory, disk, and network metrics across every device — individually and aggregated by group. Spot resource issues before they become incidents.

Kudu Cloud collects real-time resource telemetry from every managed device and surfaces it at both the per-device and fleet level. Metrics stream live over WebSockets and are retained as historical trends so you can identify patterns, investigate spikes, and forecast capacity issues.


Per-Device Telemetry

The Overview tab on any device detail page shows live system metrics:

  • CPU usage — current utilization, updated in real time
  • Memory usage — physical memory consumed vs. total
  • Disk usage — capacity used per volume, plus disk health (S.M.A.R.T.) status
  • Network — inbound and outbound throughput

All metrics update continuously over WebSocket without requiring a page refresh.

Telemetry is retained over time so you can view usage trends across hours or days. This lets you:

  • Correlate CPU or memory spikes with deployments, scheduled tasks, or user activity
  • Identify devices that are under sustained load vs. occasional spikes
  • Track disk growth over time and anticipate capacity issues

Disk Forecasting

Based on historical disk usage trends, Kudu Cloud can identify devices likely to run out of space before it happens — surfacing them as at-risk in fleet views and triggering alerts before the disk fills.


Fleet-Level Telemetry

The fleet dashboard aggregates resource metrics across all connected devices:

  • Average CPU, memory, and disk usage across the fleet
  • Devices under pressure — those currently above thresholds for CPU, memory, or disk
  • Health score distribution — how many devices are healthy, fair, at risk, or critical based on their resource state

High resource usage feeds into each device's health score. Devices under sustained CPU or memory pressure score lower and appear earlier in health-sorted views.


Group-Level Monitoring

Filter any fleet view by device group (API key) to monitor a specific segment:

  • Your Linux servers separately from Windows workstations
  • A customer site in isolation
  • A branch office or department

Within a group view, you can see aggregated resource usage across all devices in that group and quickly identify outliers against the group baseline.


Telemetry Alerts

The alert system can fire on telemetry thresholds automatically:

Built-in triggerThreshold
CPU pressureCPU sustained above 95%
Memory pressureMemory sustained above 95%
Disk criticalDisk capacity above 90%
Disk health degradedS.M.A.R.T. status warning or failure

Custom Alert Rules

You can define your own telemetry alert rules:

  1. Choose a metric (CPU usage, memory usage, disk usage, etc.)
  2. Set a threshold and comparison operator (e.g., CPU greater than 80%)
  3. Set a duration — how long the condition must persist before the alert fires (e.g., 15 minutes)
  4. Choose a severity level

This lets you tailor alerting to your environment. A build server might legitimately run at high CPU during compilation — a file server probably shouldn't.

Alerts route to email, Slack, or webhook. See Health Scores & Alerts for full alert configuration details.


Risk Flags

Telemetry conditions that cross critical thresholds set risk flags on the device:

  • disk_critical — Disk space above 90% capacity
  • disk_health_degraded — S.M.A.R.T. disk health warning or failure

These flags appear on the dashboard and in the device detail view, and are factored into the health score.


Fleet Analytics

The Analytics section provides historical resource data across your fleet:

  • Resource trends over time (CPU, memory, disk)
  • Uptime patterns
  • Disk growth forecasts
  • Health score trajectories

Reports can be exported via print-to-PDF for executive or audit purposes. See Reports for more.