Record scheduler information for each run
Currently the scheduler only records information in the log files. We should implement some kind of (optional) monitoring for each of the scheduler runs (i.e. host states and why, instance resource utilization and the scheduler decision). This way it would be possible to use this information to debug what's going on during the scheduling. Moreover, this information could be used to simulate and analyze the scheduler's behaviour.
Blueprint information
- Status:
- Not started
- Approver:
- Andrew Laski
- Priority:
- Undefined
- Drafter:
- None
- Direction:
- Needs approval
- Assignee:
- Qiu Yu
- Definition:
- Review
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
There's no assignee, is anyone stepping up to do the work? --alaski
Also the blueprint needs to be a lot more specific on exactly what would be added. --russellb
I drafted this after this review: https:/
Recently I've spent some time thinking and investigating this issue. So I'm willing to assign myself to this BP.
Some drafting ideas:
== Information to be recorded ==
I'd like to take following patch from Phil Day as a starting point.
Adds useful debug logging to filter_scheduler
https:/
- instance_uuids at the start of schedule
- request_spec at the start of schedule
- hosts in weighted order and weight values
- host selected for which instance_uuid
- and others
Since scheduler is now a queryable entity, and casting to scheduler run_instance logic is soon going to be replaced by build_and_
[1] https:/
== API ==
Currently two API formats in my mind.
Choice #1:
POST v2/{tenant_
Request JSON:
{
“list": {
}
}
POST v2/{tenant_
Request JSON:
{
“show": {
}
}
NOTE(qiuyu): need to investigate more about how to specify time range, or how many results to retrieve? Result paging? Sorting?
Choice #2:
GET v2/{tenant_
Lists all scheduler actions. Permission could be specified in policy.json. By default, only admin can list actions.
GET v2/{tenant_
Gets details for a specified action for a specified scheduler request. Permission could be specified in policy.json. By default, only admin can list actions.
I'm in favor of the choice #1, for the two reasons:
1. Since scheduler is now a queryable entity, this API format can be extended, such as scheduling dry run.
2. Flexible to handle paging logic? Not sure for the moment.
--qiuyu
@qiuyu: Were are you going to store all this information? In the DB? Wouldn't this impact the performance? I was thinking in a solution at a lower lever, since with my operator hat on I'd prefer a plain text file so I can easily take a glance at it.
qiuyu and I have been discussing this in irc a bit and probably should have updated the whiteboard. I share your concerns about storing this information in the DB for an API to expose. Not just for performance but there's added complexity for retention policies and cleanup since it's unlikely operators will want it stored indefinitely. Logging or notifications seem like the simplest solutions for this, but notifications are more difficult to consume and logging means it may get mixed in with unrelated information. But maybe for now we leave it up to deployers to deal with that and work out a mechanism for separating it later.
Please find more discussion at following link. --qiuyu
http://
[alaski] My preference would be to start small, and then improve on it. For a first pass I think it's reasonable to log this information along with the other logging that occurs. That would provide some usefulness immediately with little overhead. From there additional value can be added based on how it's used once it's available. At this point I'm not convinced that an API is going to be useful for this but once the information is available maybe I'll see the need.
deferred from icehouse-3 to "next": http://
Removed from next, as next is now reserved for near misses from the last milestone --johnthetubaguy