db/dir abstraction to allow both dir/file bases as well as db based audit/repair
We need a way to abstracting the source of scripts/
One approach is to have config define what the engine should use for backend:
entropy_backend = dir or file or sqlalchemy or something
then have entry point defined for the method:
dir = entropy.
sqlalchemy = entropy.
mongodb = entropy.
For database:
Audits are defined for "a host (fqdn or ip)" or a "group of hosts (computes | hypervisors| control nodes)"
possibly in a yaml:
Example:
compute_auidts.yaml
-------
---
compute:
audits:
hypervisor_
-------
---
hypervisor:
audits:
vm_count:
other_audits.yaml
-------
---
nova-api.
audits:
vmbooter:
Obviously these are contrived examples but just putting down for discussion.
Defined Tables (discuss more):
1) hosts = holds record for each host entropy can act on
2) enabled_audits [auditor specific table] = holds info on currently defined audits
3) disabled_audits [auditor specific table] = holds info on currently disabled audits
4) audits [auditor specific table] = for every host if "host_type" == "compute" add every enabled compute audit
"audits" table host message in the following format for every audit|host pair:
(required rows or keys)
"id": id of this record
"host": name of the host (fqdn)
"ip_address": ip address of the host
"audit_name": name of audit (vmbooter)
"next_run": datetime
"created_at": datetime
"updated_at": datetime
"disabled": True or False
"status": new or processing or processed
"result": [Hold the latest result]
...
...
(add more rows etc)
The first run of this audit will set the "next_run" value. Engine will have a periodic looping call that checks
for any audit record with next_run value < now() and sets its status to "new". Engine will have second task that
will fetch these messages and put it in internal queue (Queue.queue) and the engine will simply spwan new executor
depending on the max_workers defined value.
For dir/file based (need more discussion):
Keep the current structure. However, we might have to run every audit on executor with infinite while loop for sheduling
and might also have to do this for every host we want to run audit on ? Need more discussion on this.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- Sulochan Acharya
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
JH: An idea, more along the lines of a central scheduler.
Every X minutes the scheduler refreshes itself (either from files or from a set of db tables). The scheduler will as a result of this refreshing adjust its own internal time/scheduling tables to match the new data inputs.
The scheduler will expose this information via a blocking method, `wait_next`, the engine itself will call into this method to get the next piece of audit to trigger (or in case multiple audits to happen at the same time it will return a list of those audits to do). This would likely use something like the iterator protocol in python (wait_next being a thing that yields back everytime a event needs to occur).
The engine would wait for wait_next to yield back an audit/s to do and then it would be the engines responsibility to fire off these audits via some mechanism (threads, post on a message queue, or other). The engine would then go back to waiting for the next event (and repeat). This helps keep the 'time table' as something internal to the scheduler and allows the engine to only concerned with executing the audits reliably (and not caring about the schedule of those audits).
(keekz) For the database table layout: what advantage would we gain by having both enabled_audits and a disabled_audits table? It seems like we could do the same thing with a generic, single "audits" table (using whatever name), that has a column "disabled", that can be flipped on/off. All audits can be in the same table, and we can enable/disable them using the disabled column as necessary. This would be more inline with the nova db format, such as with the services table and being able to disable a compute using the disabled column. Additionally, I'd like to have a "disabled_reason" column (like we have with computes), so we can add notes about why an audit is disabled.