Swift proxy-side read caching
The object servers impose relatively high overhead for small reads. Caching at the proxy node can alleviate this load, if the proxies are spec'd with a large amount of memory.
Intended workloads:
Large quantities of small reads with few writes, eg CDN
Design:
A memcache server is used to do the actual caching. For each swift object, one cache object is stored, composed of an the cache time, array of headers, and the actual object payload.
A WSGI filter sits before the proxy server, which handles the caching.
The WSGI filter adds an 'If-None-Match' and 'If-Modified-Since' HTTP header if:
- The original request didn't specify these.
- The object was found in cache.
For GET and HEAD requests: If the returned answer is 304 Not modified, the response is replaced with the cached object is returned. If the response is less than 500, the cache is invalidated. If the response status is 200 and the object is deemed cachable, it is added to the cache.
For other requests: If the response code is less than 500, the cache is invalidated.
An object is considered cacheable if:
- Its size does not exceed a configured maximum
- The request does not contain a Range header
Further points of interrest
- We'll have to handle Range transfers correctly
- Header changes are not reflected in the etag, so we might be serving stale
- Documentation should explain where the cache should be in the chain (e.g. after auth)
Ideas for further improvement:
- Allow storage of objects larger than the maximum memcache size.
- Allow write through instead of write around. (ie, cache PUT operations)
- Allow write back. (probably a bad idea)
- Leverage various Cache-Control flags to avoid contacting the object servers for cached objects.
Blueprint information
- Status:
- Not started
- Approver:
- John Dickinson
- Priority:
- Undefined
- Drafter:
- Ondergetekende
- Direction:
- Needs approval
- Assignee:
- Ondergetekende
- Definition:
- New
- Series goal:
- None
- Implementation:
- Not started
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
this is a very good idea, i suggest if we can cache object(user mark or hot) in node server self-memory, <email address hidden>
What advantages does this design offer over using a dedicated cache in front of the proxy server (something like Varnish)?
- Guaranteed cache validity and correct auth-token handling {ondergetekende}