Nova scheduler may race for compute resources
The scheduler is subject to a race condition which can cause it to incorrectly identify available resources on a particular compute host. The problem occurs if multiple scheduler instances/threads concurrently issue an instance build request (i.e. run_instance) to the same compute host. This situation may oversubscribe the given compute host and cause one or more run_instance requests to fail.
Blueprint information
- Status:
- Complete
- Approver:
- Vish Ishaya
- Priority:
- Medium
- Drafter:
- Brian Elliott
- Direction:
- Approved
- Assignee:
- Brian Elliott
- Definition:
- Approved
- Series goal:
- Accepted for folsom
- Implementation:
- Implemented
- Milestone target:
- 2012.2
- Started by
- Vish Ishaya
- Completed by
- Vish Ishaya
Whiteboard
Gerrit topic: https:/
Addressed by: https:/
Keep the ComputeNode model updated with usage
Addressed by: https:/
Adds generic retries for build failures.
Work Items
Work items:
Added scheduling retries when build errors occur: DONE
Added resource tracking in the compute host to more gracefully control resource usage and provide up-to-date information to the scheduler: INPROGRESS