When something breaks at 3 AM, you're SSHing into servers, grepping through scattered log files, and hoping you find the signal before your users do.
Logs live on the servers that produced them. Correlating an application error with a CPU spike means toggling between three SSH sessions and hoping timestamps line up.
grep and tail -f only take you so far. Querying
"what was my API logging between 14:45 and 15:30 yesterday" shouldn't
require a ritual.
Memory leaks, disk saturation, runaway processes - you find out when users complain, not when the metric crosses a threshold.
A single static binary runs on each machine you want to monitor. It streams metrics and logs over gRPC to your central Raven server, which stores, indexes, and surfaces everything through one dashboard.
The agent reads /proc every 10 seconds - CPU, memory, disk
I/O, network, load average - and streams batches to the server over a
persistent gRPC connection with TLS.
Logs land in ClickHouse with a bloom filter index on content. Full-text search across all your hosts and apps, over any time range, in sub-second queries.
Threshold rules with a state machine: OK → Pending → Firing → Resolved. Notifications fire only on transitions. Discord, Slack, and email supported out of the box.
Three steps from zero to a live dashboard. No account creation, no cloud signup, no vendor lock-in.
coming soon.
coming soon.
coming soon.