Monitoring probes and alerting service (for UEC and more)
As a UEC admin I'm alerted by UEC when physical nodes go down or services are flaky.
As a UEC admin I can integrate my UEC deployement into my existing nagios system.
Notes: nagios probes, monitoring (munin, ganglia) probes.
Blueprint information
- Status:
- Not started
- Approver:
- Robbie Williamson
- Priority:
- Undefined
- Drafter:
- None
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- Review
- Series goal:
- None
- Implementation:
- Deferred
- Milestone target:
- None
- Started by
- Completed by
Whiteboard
To be discusses:
Should monitoring (collectd) and logging (rsyslog) be using one network transport? If so, which one: collectd, relp syslog, reconnoiter?
Work Items:
Move collectd to main (MIR).
Refine relevant measures for UEC deployments.
Write collectd input plugins for each of them.
Refine monitoring probes for UEC deployments.
Provide nagios plugins for each of them.
Install collectd on every UEC components.
Install all monitoring and measuring probes on every UEC components.
Automatically setup collectd to send all monitoring data to central monitoring server (CLC) with puppet recipes.
Investigate graphing solutions (munin, graphite, reconnoiter (omniti - not packaged), visage, ganglia).