Cluster Creation and Management
Detecting host failures in OpenStack deployment is important to achieve following use-cases --
1. VMs running on a failed host can be evacuated to minimize disruption.
2. OpenStack services impacted by failed host can be restarted on healthy nodes.
Distributed concensus is an effective mechanism for detecting host failures and restore service disruptions. In addition, it has other applications such as
1. Detecting network partitions in a datacenter environment.
2. Provides a liveness detection for VMs and hosts which can be used by orchastration services in OpenStack. This method is much quicker than OpenStack based heartbeach mechanisms.
3. Can provide a distributed Key-Value store along with service discovery which can be used for liveness detection of OpenStack services.
Creating and monitoring a cluster across multiple hosts for detecting host failures and network partitions can be a difficult task and involves manual intervention. If the host failures are permanent the cluster needs to be reconfigured according the available hosts manually.
This blueprint proposes a new service that can setup, monitor and reconfigure such distributed consensus clusters. In summary --
1. This service will setup and monitor the cluster on the selected hosts and attempt to address the problems mentioned earlier.
2. As hosts are added/removed from OpenStack environment, it will automatically re-establish the distributed quorum.
3. In case of failures, it removes and fences out defective nodes until the problem is fixed.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- Pushkar Acharya
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- Drafting
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by