Monitor critical LAVA resources and notify administrators of problems

Registered by Paul Larson

Monitoring uptime is handled in a separate blueprint and must be done externally, or notification is useless. Monitoring critical resources such as memory usage and disk utilization can give us early warning of problems on the server that could lead to downtime. This kind of monitoring can be done locally or externally.

Blueprint information

Status:
Complete
Approver:
Paul Larson
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
Obsolete
Series goal:
Accepted for trunk
Implementation:
Unknown
Milestone target:
None
Completed by
Andy Doan

Related branches

Sprints

Whiteboard

No code changes needed for this, it is purely infrastructure in the validation farm.
[doanac, 2012-11-29]: this has largely been handled by Munin and Monitis

Meta:
Headline: N/A
Acceptance: resources such as disk space on all lava-critical machines are tracked, and notifications are sent when they fall below a safe threshold
Roadmap id: LAVA2012-LAB-MONITORING

(?)

Work Items

Work items:
Draft: TODO

This blueprint contains Public information 
Everyone can see this information.