Add a nova-audit service for periodic maintenance
Excerpts taken from the spec proposal: https:/
Nova is a distributed system, which means that things fail in strange
ways and data stored across multiple systems gets out of sync with the
actual state of reality. Hosts and instances come and go, along with
network connectivity, the message bus and database. Recently we have
gained a number of "heal $thing" routines that operators can run
either periodically or on demand to synchronize the states of various
services and data stores to resolve or prevent problems.
In most cases, these tasks are idempotent and safe to run even when nothing is wrong.
Operators need a single mechanism for performing these maintenance tasks and
healing activities that can be run periodically in the background with
minimal impact to runtime performance, other than to hopefully fix
problems related to inconsistencies before they become acute enough to
get a human involved.
We already have a number of these maintenance activities codified in
one-shot commands that can be run on-demand once a problem has been
identified. Since most of them are not harmful or overly expensive, we
should be able to run those things periodically to attempt to fix
problems automatically before the operator gets involved.
This spec proposes a new binary called ``nova-audit`` to encapsulate
these tasks. Ideally it should be usable in multiple ways:
- As a singleton daemon that periodically runs tasks at various
intervals according to their potential impact on the system and
need.
- As a one-shot "fix stuff" command that can be run from cron or
otherwise scheduled or executed.
- As a daemon or one-shot command that purely audits potential
problems, but makes no changes.
A new config section of ``[audit]`` would be added with timers and
default values for each task.
Blueprint information
- Status:
- Started
- Approver:
- Balazs Gibizer
- Priority:
- Medium
- Drafter:
- melanie witt
- Direction:
- Approved
- Assignee:
- melanie witt
- Definition:
- Pending Approval
- Series goal:
- None
- Implementation:
- Slow progress
- Milestone target:
- None
- Started by
- melanie witt
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Spec: https:/
[efried 20200214] Spec approved
Gerrit topic: https:/
Addressed by: https:/
Move nova-manage db purge to nova-audit
Addressed by: https:/
Move nova-manage db archive_
Addressed by: https:/
Move nova-manage cell_v2 discover_hosts to nova-audit
Addressed by: https:/
Move nova-manage cell_v2 map_instances to nova-audit
Addressed by: https:/
Move nova-manage placement sync_aggregates to nova-audit
Addressed by: https:/
Move nova-manage placement heal_allocations to nova-audit
[efried 20200220] Agreed in the Nova meeting to Direction:Approve all Definition:Approved blueprints http://
[gibi 20200414] we hit feature freeze in Ussuri, so deferring this to Victoria
Addressed by: https:/
nova-audit: Use cliff instead of homegrown argparse bleh
Addressed by: https:/
Re-propose nova-audit spec for Victoria
[gibi 20200526] approved as spec approved
[gibi 20200928] as we hit feature freeze I'm deferring this from Victoria
[gibi 20210715] spec has been merged so the bp is approved to Xena
Implementation: https:/
[2021-09-07 gibi]: We hit feature freeze so it is now deferred from Xena.