Cache Invalidation¶
- Cache invalidation is an important part of your cache policy
- Varnish automatically invalidates expired objects
- You can proactively invalidate objects with Varnish
- You should define your cache invalidation rules before caching objects specially in production environments
There are four mechanisms to invalidate caches in Varnish:
- HTTP PURGE
- Use the
vcl_purge
subroutine- Invalidate caches explicitly using objects’ hashes
vcl_purge
is called viareturn(purge)
fromvcl_recv
vcl_purge
removes all variants of an object from cache, freeing up memory- The
restart
return action can be used to update immediately a purged object
- Banning
- Use the built-in function
ban(regex)
- Invalidates objects in cache that match the regular-expression
- Does not necessarily free up memory at once
- Also accessible from the management interface
- Force Cache Misses
- Use
req.hash_always_miss
invcl_recv
- If set to true, Varnish disregards any existing objects and always (re)fetches from the backend
- May create multiple objects as side effect
- Does not necessarily free up memory at once
- Surrogate keys
- For websites with the need for cache invalidation at a very large scale
- Varnish Software’s implementation of surrogate keys
- Flexible cache invalidation based on cache tags
- Available as hashtwo VMOD in Varnish Plus 4.0
- Available as xkey VMOD in Varnish Cache 4.1 and later
Purge - Bans - Cache Misses - Surrogate Keys¶
Which and when to use?
Purge | Soft Purge | Bans | Force Cache Misses | Surrogate keys | |
---|---|---|---|---|---|
Targets | Specific object (with all its variants) | Specific object (with all its variants) | Regex patterns | One specific object (with all its variants) | All objects with a common hashtwo key |
Frees memory | Immediately | After grace time | After pattern is checked and matched | No | Immediately |
Scalability | High | High | High if used properly | High | High |
CLI | No | No | Yes | No | No |
VCL | Yes | Yes | Yes | Yes | Yes |
Availability | Varnish Cache | Varnish Cache | Varnish Cache | Varnish Cache | Hashtwo VMOD in Varnish Plus 4.0 or xkey VMOD in Varnish Cache 4.1 |
Whenever you deal with caching, you have to eventually deal with the challenge of cache invalidation, or content update. Varnish has different mechanisms to addresses this challenge, but which one to use?
There is rarely a need to pick only one solution, as you can implement many of them. However, you can try to answer the following questions:
- Am I invalidating one or many specific objects?
- Do I need to free up memory or just replace the content?
- How long time does it take to replace the content?
- Is this a regular or a one-off task?
or follow these guidelines:
- If you need to invalidate more than one item at a time, consider using bans or hashtwo/xkey.
- If it takes a long time to pull content from the backend into Varnish, consider forcing cache misses by using
req.hash_always_miss
.
The rest of the chapter teaches you more about these cache invalidation mechanisms.
Note
Purge and hashtwo/xkey work very similar. The main difference is that they act on different hash keys.
HTTP PURGE¶
- If you know exactly what to remove, use
HTTP PURGE
- Frees up memory, removes all
Vary:
-variants of the object - Leaves it to the next client to refresh the content
- Often combined with
return(restart);
- As easy as handling any other HTTP request
A purge is what happens when you pick out an object from the cache and discard it along with its variants.
A resource can exist in multiple Vary:
-variants.
For example, you could have a desktop version, a tablet version and a smartphone version of your site, and use the Vary
HTTP header field in combination with device detection to store different variants of the same resource.
Usually a purge is invoked through HTTP with the method PURGE
.
A HTTP PURGE
is another request method just as HTTP GET
.
Actually, you can call the PURGE
method whatever you like, but PURGE
has become the de-facto naming standard.
Squid, for example, uses the PURGE
method name for the same purpose.
Purges apply to a specific object, since they use the same lookup operation as in vcl_hash
.
Therefore, purges find and remove objects really fast!
There are, however, two clear down-sides. First, purges cannot use regular-expressions, and second, purges evict content from cache regardless the availability of the backend. That means that if you purge some objects and the backend is down, Varnish will end up having no copy of the content.
VCL – vcl_purge
¶
- You may add actions to be executed once the object and its variants is purged
- Called after the purge has been executed
sub vcl_purge {
return (synth(200, "Purged"));
}
Note
Cache invalidation with purges is done by calling return (purge);
from vcl_recv
in Varnish 4.
The keyword purge;
from Varnish 3 has been retired.
Example: PURGE
¶
vcl/purge.vcl
sub vcl_recv {
if (req.method == "PURGE"){
return (purge);
}
}
In the example above, return (purge)
ends execution of vcl_recv
and jumps to vcl_hash
.
When vcl_hash
calls return(lookup)
, Varnish purges the object and then calls vcl_purge
.
You can test this code with HTTPie by issuing:
http -p hH --proxy=http:http://localhost PURGE www.example.com
Alternatively, you can test it with varnishtest
as in the subsection PURGE in varnishtest.
In order to control the IP addresses that are allowed to send PURGE
, you can use Access Control Lists (ACLs).
A purge example using ACLs is in the Access Control Lists (ACLs) section.
Exercise: PURGE
an article from the backend¶
- Send a
PURGE
request to Varnish from your backend server after an article is published.- Simulate the article publication.
- The result is that the article is evicted in Varnish.
You are provided with article.php
, which fakes an article.
It is recommended to create a separate php file to implement purging.
article.php
<?php
header("Cache-Control: max-age=10");
$utc = new DateTimeZone("UTC");
$date = new DateTime("now", $utc);
$now = $date->format( DateTime::RFC2822 );
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body>
<h1>This article is cached for 10 seconds</h1>
<h2>Cache timestamp: <?php echo $now; ?></h2>
<a href="<?=$_SERVER['PHP_SELF']?>">Refresh this page</a>
</body>
</html>
If you need help, see Solution: PURGE an article from the backend.
Tip
Remember to place your php files under /var/www/html/
.
PURGE
with restart
return action¶
- Start the VCL processing again from the top of
vcl_recv
- Any changes made are kept
acl purgers {
"127.0.0.1";
"192.168.0.0"/24;
}
sub vcl_recv {
# allow PURGE from localhost and 192.168.0...
if (req.method == "PURGE") {
if (!client.ip ~ purgers) {
return (synth(405, "Purging not allowed for " + client.ip));
}
return (purge);
}
}
sub vcl_purge {
set req.method = "GET";
return (restart);
}
The restart
return action allows Varnish to re-run the VCL state machine with different variables.
This is useful in combination with PURGE, in the way that a purged object can be immediately restored with a new fetched object.
Every time a restart occurs, Varnish increments the req.restarts
counter.
If the number of restarts is higher than the max_restarts
parameter, Varnish emits a guru meditation error.
In this way, Varnish safe guards against infinite loops.
Warning
Restarts are likely to cause a hit against the backend, so do not increase max_restarts
thoughtlessly.
Softpurge
¶
- Sets TTL to 0
- Allows Varnish to serve stale content to users if the backend is unavailable
- Asynchronous and automatic backend fetching to update object
Softpurge is cache invalidation mechanism that sets TTL to 0 but keeps the grace value of a cached object. This is useful if you want to build responses using the cached object while updating it.
Softpurge is a VMOD part of varnish-modules https://github.com/varnish/varnish-modules. For installation and usage details, please refer to its own documentation https://github.com/varnish/varnish-modules/blob/master/docs/vmod_softpurge.rst.
Tip
The xkey VMOD has the softpurge functionality too.
Banning¶
Use
ban
to invalidate caches on cache hitsFrees memory on ban patterns matching
Examples in the
varnishadm
command line interface:ban req.url ~ /foo
ban req.http.host ~ example.com && obj.http.content-type ~ text
ban.list
Example in VCL:
ban("req.url ~ /foo");
Example of VCL code to act on
HTTP BAN
request method:sub vcl_recv { if (req.method == "BAN") { ban("req.http.host == " + req.http.host + " && req.url == " + req.url); # Throw a synthetic page so the request won't go to the backend. return(synth(200, "Ban added")); } }
Banning in the context of Varnish refers to adding a ban expression that prohibits Varnish to serve certain objects from the cache. Ban expressions are more useful when using regular-expressions.
Bans work on objects already in the cache, i.e., it does not prevent new content from entering the cache or being served.
Cached objects that match a ban are marked as obsolete.
Obsolete objects are expunged by the expiry thread like any other object with obj.ttl == 0
.
Ban expressions match against req.*
or obj.*
variables.
Think about a ban expression as; “the requested URL starts with /sport
”, or “the cached object has a header field with value matching lighttpd
”.
You can add ban expressions in three ways: 1) VCL code, 2) use a customized HTTP request method, or 3) issuing commands in the varnishadm
CLI.
Ban expressions are inserted into a ban-list. The ban-list contains:
- ID of the ban,
- timestamp when the ban entered the ban-list,
- counter of objects that have matched the ban expression,
- a
C
flag for completed that indicates whether a ban is invalid because it is duplicated, - the ban expression.
To inspect the current ban-list, issue the ban.list
command in the CLI:
0xb75096d0 1318329475.377475 10 obj.http.x-url ~ test0
0xb7509610 1318329470.785875 20C obj.http.x-url ~ test1
Varnish tests bans whenever a request hits a cached object. A cached object is checked against bans added after the last checked ban. That means that each object checks against a ban expression only once.
Bans that match only against obj.*
are also checked by a background worker thread called the ban lurker.
The parameter ban_lurker_sleep
controls how often the ban lurker tests obj.*
bans.
The ban lurker can be disabled by setting ban_lurker_sleep
to 0.
Bans can free memory in a very scalable manner if used properly. Bans free memory only after a ban expression hits an object. However, since bans do not prevent new backend responses from being inserted in the cache, client requests that trigger the eviction of an object will most likely insert a new one matching the ban. Therefore, ban lurker banning is more effective when freeing memory, as we shall see next.
Note
You should avoid ban expressions that match against req.*
, because these expressions are tested only by client requests, not the ban lurker.
In other words, a req.*
ban expression will be removed from the ban list only after a request matches it.
Consequently, you have the risk of accumulating a very large number of ban expressions.
This might impact CPU usage and thereby performance.
Therefore, we recommend you to avoid req.*
variables in your ban expressions, and to use obj.*
variables instead.
Ban expressions using only obj.*
are called lurker-friendly bans.
Note
If the cache is completely empty, only the last added ban stays in the ban-list.
Lurker-Friendly Bans¶
- Ban expressions that match only against
obj.*
- Evaluated asynchronously by the ban lurker thread
- Similar to the concept of garbage collection
Ban expressions are checked in two cases: 1) when a request hits a cached object, or 2) when the ban lurker wakes up. The first case is efficient only if you know that the cached objects to be banned are frequently accessed. Otherwise, you might accumulate a lot of ban expressions in the ban-list that are never checked. The second case is a better alternative because the ban lurker can help you keep the ban-list at a manageable size. Therefore, we recommend you to create ban expressions that are checked by the ban lurker. Such ban expressions are called lurker-friendly bans.
Lurker-friendly ban expressions are those that use only obj.*
, but not req.*
variables.
Since lurker-friendly ban expressions lack of req.*
, you might need to copy some of the req.*
contents into the obj
structure.
In fact, this copy operation is a mechanism to preserve the context of client request in the cached object.
For example, you may want to copy useful parts of the client context such as the requested URL from req
to obj
.
The following snippet shows an example on how to preserve the context of a client request in the cached object:
sub vcl_backend_response {
set beresp.http.x-url = bereq.url;
}
sub vcl_deliver {
# The X-Url header is for internal use only
unset resp.http.x-url;
}
Now imagine that you just changed a blog post template that requires all blog posts that have been cached. For this you can issue a ban such as:
$ varnishadm ban 'obj.http.x-url ~ ^/blog'
Since it uses a lurker-friendly ban expression, the ban inserted in the ban-list will be gradually evaluated against all cached objects until all blog posts are invalidated.
The snippet below shows how to insert the same expression into the ban-list in the vcl_recv
subroutine:
sub vcl_recv {
if (req.method == "BAN") {
# Assumes the ``X-Ban`` header is a regex,
# this might be a bit too simple.
ban("obj.http.x-url ~ " + req.http.x-ban);
return(synth(200, "Ban added"));
}
}
Exercise: Write a VCL program using purge and ban¶
- Write a VCL program that handles the
PURGE
andBAN
HTTP methods. - When handling the
BAN
method, use the request header fieldsreq.http.x-ban-url
andreq.http.x-ban-host
- Use Lurker-Friendly Bans
- To build further on this, you can also use the
REFRESH
HTTP method that fetches new content, usingreq.hash_always_miss
, which is explained in the next subsection
To test this exercise, you can use HTTPie:
http -p hH PURGE http://localhost/testpage
http -p hH BAN http://localhost/ 'X-Ban-Url: .*html$' \
'X-Ban-Host: .*\.example\.com'
http -p hH REFRESH http://localhost/testpage
For information on cache invalidation in varnishtest
, refer to the subsection Cache Invalidation in varnishtest.
If you need help, see Solution: Write a VCL program using purge and ban.
Force Cache Misses¶
set req.hash_always_miss = true;
invcl_recv
- Causes Varnish to look the object up in cache, but ignore any copy it finds
- Useful way to do a controlled refresh of a specific object
- If the server is down, the cached object is left untouched
- Useful to refresh slowly generated content
Setting a request in pass mode instructs Varnish to always ask a backend for content, without storing the fetched object into cache.
The vcl_purge
removes old content, but what if the web server is down?
Setting req.has_always_miss
to true
tells Varnish to look up the content in cache, but always miss a hit.
This means that Varnish first calls vcl_miss
, then (presumably) fetches the content from the backend, cache the updated object, and deliver the updated content.
The distinctive behavior of req.hash_always_miss
occurs when the backend server is down or unresponsive.
In this case, the current cached object is untouched.
Therefore, client requests that do not enable req.hash_always_miss
keep getting the old and untouched cached content.
Two important use cases for using req.hash_always_miss
are when you want to:
1) control who takes the penalty for waiting around for the updated content (e.g. a script you control), and
2) ensure that content is not evicted before it is updated.
Note
Forcing cache misses do not evict old content. This means that causes Varnish to have multiple copies of the content in cache. In such cases, the newest copy is always used. Keep in mind that duplicated objects will stay as long as their time-to-live is positive.
Hashtwo/Xkey (Varnish Software Implementation of Surrogate Keys)¶
- Hashtwo or xkey are the Varnish Software’s implementation of surrogate keys
- Hashtwo is available in Varnish Cache Plus 3.x and 4.0 only
- Xkey is open source and is available in Varnish Cache 4.1 or later
- Cache invalidation based on cache tags
- Adds patterns easily to be matched against
- Highly scalable
The idea is that you can use any arbitrary string for cache invalidation. You can then key your cached objects on, for example, product ID or article ID. In this way, when you update the price of a certain product or a specific article, you have a key to evict all those objects from the cache.
So far, we have discussed purges and bans as methods for cache invalidation. Two important distinctions between them is that purges remove a single object (with its variants), whereas bans perform cache invalidation based on matching expressions. However, there are cases where none of these mechanisms are optimal.
Hashtwo/xkey creates a second hash key to link cached objects based on cache tags. This hash keys provide the means to invalidate cached objects with common cache tags.
In practice, hashtwo/xkey create cache invalidation patterns, which can be tested and invalidated immediately just as purges do. In addition, hashtwo/xkey is much more efficient than bans because of two reasons: 1) looking up hash keys is much more efficient than traversing ban-lists, and 2) every time you test a ban expression, it checks every object in the cache that is older than the ban itself.
The hashtwo and xkey VMOD are pre-built for supported versions and can be installed using regular package managers from the Varnish Software repositories. Once your repository is properly configured, as indicated in Solution: Install Varnish, issue the following commands to install the hashtwo VMOD:
On Debian or Ubuntu:
apt-get install libvmod-hashtwo
On Red Hat Enterprise Linux:
yum install libvmod-hashtwo
Finally, you can use this VMOD by importing it in your VCL code:
import hashtwo;
Xkey is a part of varnish-modules https://github.com/varnish/varnish-modules. For installation and usage details, please refer to its own documentation https://github.com/varnish/varnish-modules/blob/master/docs/vmod_xkey.rst.
Tip
The xkey VMOD has a softpurge function as well.
Example Using Hashtwo or Xkey¶
Use case: E-commerce site
Same logic for hashtwo and xkey
HTTP response header from web page containing three products:
8155054
,166412
and234323
:HTTP/1.1 200 OK Server: Apache/2.2.15 X-HashTwo: 8155054 X-HashTwo: 166412 X-HashTwo: 234323
HTTP request header to purge pages containing product
166412
:GET / HTTP/1.1 Host: www.example.com X-HashTwo-Purge: 166412
VCL example code for hashtwo:
import hashtwo; sub vcl_recv { if (req.http.X-HashTwo-Purge) { if (hashtwo.purge(req.http.X-HashTwo-Purge) != 0) { return (purge); } else { return (synth(404, "Key not found")); } } }
On an e-commerce site the backend application adds the X-HashTwo
HTTP header field for every product that is included in a web page.
The header for a certain page might look like the one above.
If you use xkey instead of hashtwo, you should rename that header so you do not get confused.
Normally the backend is responsible for setting these headers. If you were to do it in VCL, it will look something like this:
sub vcl_backend_response {
set beresp.http.X-HashTwo = "secondary_hash_key";
}
In the VCL code above, the hashtwo key to be purged is the value in the X-HashTwo-Purge
HTTP header.
In order to keep the web pages in sync with the database, you can set up a trigger in your database.
In that way, when a product is updated, an HTTP request towards Varnish is triggered.
For example, the request above invalidates every cached object with the matching hashtwo header in hashtwo.purge(req.http.X-HashTwo-Purge)
or xkey.purge(req.http.X-Key-Purge)
for the xkey VMOD.
After purging, Varnish should respond something like:
HTTP/1.1 200 Purged
Date: Thu, 24 Apr 2014 17:08:28 GMT
X-Varnish: 1990228115
Via: 1.1 Varnish
The objects are now cleared.
Warning
You should protect purges with ACLs from unauthorized hosts.