Enable health management for senlin
As a clustering service, Senlin can help to create/
Blueprint information
- Status:
- Complete
- Approver:
- Qiming Teng
- Priority:
- High
- Drafter:
- Cindia-blue
- Direction:
- Approved
- Assignee:
- Cindia-blue
- Definition:
- Approved
- Series goal:
- Accepted for mitaka
- Implementation:
- Implemented
- Milestone target:
- mitaka-3
- Started by
- Cindia-blue
- Completed by
- Cindia-blue
Related branches
Related bugs
Sprints
Whiteboard
This blueprint is targeted to scope and resolve health management for clusters and nodes created by Senlin. Health management service of Senlin takes care of status consistency of clusters/nodes and recovers of the nodes in “ERROR” status by given operations from users. Instead of monitoring to the whole infrastructure or cloud application in huge domain, the health management service will trigger periodical status check to the clusters. Users could use health_policy to define recover operations when error happens and bond the policy to the targeted clusters. Another advantage of policy bonding is to differentiate the clusters where higher health consistency is required from others. For these clusters, Health management service will enable embed listener for quick process once status changed.
Use Cases
=========
Two typical use cases are listed as follows but Health management service should not be limited to
the two use cases:
A) Auto-scaling cluster need the consistency of node health when scale out or scale in for accurate
calculation of node count to change.
B) When users list nodes or cluster status, consistent status can be provided with underlying nova.
This will allow applications or users to run based on the status kept by senlin.
Design
======
There are three parts of functions should be implemented for Health management design:
A) Detection of status inconsistency: both polling based and listener base functions should be
provided.
B) Recovery of cluster: recovery actions should be provided for both clusters and nodes. To make the
design extensible for different profile types, detailed recover operations should be implement and
override in profile.
C) Customization: Instead of direct change to engine of Senlin, users can define and override the
health_policy to include the recover operation and attach the policy to the clusters who think need
more health care than others.
Gerrit topic: https:/
Addressed by: https:/
Implement do_check method for nova profile
Addressed by: https:/
Implement node_recover in Profile
Addressed by: https:/
Add Recover into Node Actions and Node Model
Addressed by: https:/
Add Description about Recover of Profile
Addressed by: https:/
Add Recover as a Cluster Action
Addressed by: https:/
Add Recover into RPC API
Addressed by: https:/
Expose Check Function in Profile
Addressed by: https:/
Add Doc for Check and Recover Actions
Work Items
Dependency tree
* Blueprints in grey have been implemented.