zaqar

Controlling Disk Use and Handling Overflow

Registered by Allele Dev on 2013-12-09

The goal here is to handle the case of a shard exceeding its storage capacity as gracefully as possible. Two mechanisms that come to mind for the handling of overflow are listed below:

- operator-set per-queue maximum storage size
- automatic partitioning of set of messages for a given queue across multiple shards

Blueprint information

Status:: Not started

Approver:: None

Priority:: Medium

Drafter:: Allele Dev

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: None

Implementation:: Not started

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

Schedule for Juno.

We have two issues to address:
-- Noisy neighbor
-- Unlimited queue length

This BP addresses the latter. See https://blueprints.launchpad.net/marconi/+spec/sharded-queue-migration for "noisy neighbor".

I think infinite storage can be addressed by using mongo's built-in sharding - you could hash on the marker, but this would require a marker proxy because we can no longer depend on the unique index.

If you shard the queue across two different backend db nodes, no matter how you look at it, you will need to have a marker proxy.

Anyway, I could be wrong, but this is my current thinking. Flavio is in favor of not sharing a queue across multiple backends, period (FWIW).

An alternative to "infinite queue length" is to solve that in the filesystem layer using something like Ceph, Gluster, SolidFire, etc.

We should also consider using filesystems that offer realtime compression, or use snappy to compress message bodies before even sending them to storage - see also this bp: https://blueprints.launchpad.net/marconi/+spec/compress-messages

Another possibility: if we can rebalance queues based on CPU and network usage, why not also on queue length? Move long queues to nodes with more disk. Combine this approach with realtime compression (either in the app or in the filesystem), we would likely have an upper ceiling that is impractical for any queue to ever reach given they can only post so many messages/sec anyway.

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.