VCL Subroutines¶
- Typical subroutines to customize:
vcl_recv
,vcl_pass
,vcl_backend_fetch
,vcl_backend_response
,vcl_hash
,vcl_hit
,vcl_miss
,vcl_deliver
, andvcl_synth
- If your VCL subroutine does return, you skip the built-in VCL subroutine
- The built-in VCL subroutines are always appended to yours
This chapter covers the VCL subroutines where you customize the behavior of Varnish. VCL subroutines can be used to: add custom headers, change the appearance of the Varnish error message, add HTTP redirect features in Varnish, purge content, and define what parts of a cached object is unique. After this chapter, you should know where to add your custom policies and you will be ready to dive into more advanced features of Varnish and VCL.
Note
It is strongly advised to let the default built-in subroutines whenever is possible. The built-in subroutines are designed with safety in mind, which often means that they handle any flaws in your VCL code in a reasonable manner.
Tip
Looking at the code of built-in subroutines can help you to understand how to build your own VCL code.
Built-in subroutines are in the file /usr/share/doc/varnish/examples/builtin.vcl.gz
or {varnish-source-code}/bin/varnishd/builtin.vcl
.
The first location may change depending on your distro.
VCL – vcl_recv
¶
- Normalize client input
- Pick a backend web server
- Re-write client-data for web applications
- Decide caching policy based on client input
- Access Control Lists (ACL)
- Security barriers, e.g., against SQL injection attacks
- Fixing mistakes, e.g.,
index.htlm
->index.html
vcl_recv
is the first VCL subroutine executed, right after Varnish has parsed the client request into its basic data structure.
vcl_recv
has four main uses:
- Modifying the client data to reduce cache diversity. E.g., removing any leading “www.” in the
Host:
header. - Deciding which web server to use.
- Deciding caching policy based on client data. For example; no caching POST requests but only caching specific URLs.
- Executing re-write rules needed for specific web applications.
In vcl_recv
you can perform the following terminating actions:
pass: It passes over the cache lookup, but it executes the rest of the Varnish request flow. pass does not store the response from the backend in the cache.
pipe: This action creates a full-duplex pipe that forwards the client request to the backend without looking at the content. Backend replies are forwarded back to the client without caching the content. Since Varnish does no longer try to map the content to a request, any subsequent request sent over the same keep-alive connection will also be piped. Piped requests do not appear in any log.
hash: It looks up the request in cache.
purge: It looks up the request in cache in order to remove it.
synth - Generate a synthetic response from Varnish. This synthetic response is typically a web page with an error message. synth may also be used to redirect client requests.
It’s also common to use vcl_recv
to apply some security measures.
Varnish is not a replacement for intrusion detection systems, but can still be used to stop some typical attacks early.
Simple Access Control Lists (ACLs) can be applied in vcl_recv
too.
For further discussion about security in VCL, take a look at the Varnish Security Firewall (VSF) application at https://github.com/comotion/VSF. The VSF supports Varnish 3 and above. You may also be interested to look at the Security.vcl project at https://github.com/comotion/security.vcl. The Security.vcl project, however, supports only Varnish 3.x.
Tip
The built-in vcl_recv
subroutine may not cache all what you want, but often it’s better not to cache some content instead of delivering the wrong content to the wrong user.
There are exceptions, of course, but if you can not understand why the default VCL does not let you cache some content, it is almost always worth it to investigate why instead of overriding it.
Revisiting built-in vcl_recv
¶
sub vcl_recv {
if (req.method == "PRI") {
/* We do not support SPDY or HTTP/2.0 */
return (synth(405));
}
if (req.method != "GET" &&
req.method != "HEAD" &&
req.method != "PUT" &&
req.method != "POST" &&
req.method != "TRACE" &&
req.method != "OPTIONS" &&
req.method != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
if (req.http.Authorization || req.http.Cookie) {
/* Not cacheable by default */
return (pass);
}
return (hash);
}
Example: Basic Device Detection¶
One way of serving different content for mobile devices and desktop browsers is to run some simple parsing on the User-Agent header. The following VCL code is an example to create custom headers. These custom headers differentiate mobile devices from desktop computers.
sub vcl_recv {
if (req.http.User-Agent ~ "iPad" ||
req.http.User-Agent ~ "iPhone" ||
req.http.User-Agent ~ "Android") {
set req.http.X-Device = "mobile";
} else {
set req.http.X-Device = "desktop";
}
}
You can read more about different types of device detection at https://www.varnish-cache.org/docs/trunk/users-guide/devicedetection.html
This simple VCL will create a request header called X-Device which will contain either mobile
or desktop
.
The web server can then use this header to determine what page to serve, and inform Varnish about it through Vary: X-Device
.
It might be tempting to just send Vary: User-Agent
, but that requires you to normalize the User-Agent header itself because there are many tiny variations in the description of similar User-Agents.
This normalization, however, leads to loss of detailed information of the browser.
If you pass the User-Agent header without normalization, the cache size may drastically inflate because Varnish would keep possibly hundreds of different variants per object and per tiny User-Agent variants.
For more information on the Vary
HTTP response header, see the Vary section.
Note
If you do use Vary: X-Device
, you might want to send Vary: User-Agent
to the users after Varnish has used it.
Otherwise, intermediary caches will not know that the page looks different for different devices.
Exercise: Rewrite URL and Host Header Fields¶
- Copy the
Host
header field (req.http.Host
) and URL (req.url
) to two new request headers:req.http.x-host
andreq.http.x-url
. - Ensure that www.example.com and example.com are cached as one, using
regsub()
. - Rewrite all URLs under http://sport.example.com to http://example.com/sport/. For example: http://sport.example.com/index.html to http://example.com/sport/index.html.
- Use HTTPie to verify the result.
- Extra: Make sure / and /index.html are cached as one object.
- Extra 2: Make the redirection work for any domain with sport. at the front. E.g: sport.example.com, sport.foobar.example.net, sport.blatti, etc.
For the first point, use set req.http.headername = "value";
or set req.http.headername = regsub(...);
.
In point 2, change req.http.host
by calling the function regsub(str, regex, sub)
.
str is the input string, in this case, req.http.host
.
regex is the regular-expression matching whatever content you need to change.
Use ^
to match what begins with www, and \.
to finish the regular-expression, i.e. ^www..
sub is what you desire to change it with, an empty string ""
can be used to remove what matches regex.
For point 3, you can check host headers with a specific domain name, for example: if (req.http.host == "sport.example.com")
.
An alternative is to check for all hosts that start with sport, regardless the domain name: if (req.http.host ~ "^sport\.")
.
In the first case, setting the host header is straight forward: set req.http.host = "example.com"
.
In the second case, you can set the host header by removing the string that precedes the domain name set req.http.host = regsub(req.http.host,"^sport\.", "");
Finally, you rewrite the URL in this way: set req.url = regsub(req.url, "^", "/sport");
.
To simulate client requests, you can either use HTTPie or varnishtest
.
If you need help, see Solution: Rewrite URL and Host Header Fields.
Tip
Remember that man vcl
contains a reference manual with the syntax and details of functions such as regsub(str, regex, sub)
.
We recommend you to leave the default VCL file untouched and create a new file for your VCL code.
Remember to update the location of the VCL file in the Varnish configuration file and reload it.
VCL – vcl_pass
¶
- Called upon entering pass mode
sub vcl_pass {
return (fetch);
}
The vcl_pass
subroutine is called after a previous subroutine returns the pass action.
This actions sets the request in pass mode.
vcl_pass
typically serves as an important catch-all for features you have implemented in vcl_hit
and vcl_miss
.
vcl_pass
may return three different actions: fetch, synth, or restart.
When returning the fetch action, the ongoing request proceeds in pass mode.
Fetched objects from requests in pass mode are not cached, but passed to the client.
The synth and restart return actions call their corresponding subroutines.
hit-for-pass¶
- Used when an object should not be cached
- hit-for-pass object instead of fetched object
- Has TTL
Some requested objects should not be cached.
A typical example is when a requested page contains the Set-Cookie
response header, and therefore it must be delivered only to the client that requests it.
In this case, you can tell Varnish to create a hit-for-pass object and stores it in the cache, instead of storing the fetched object.
Subsequent requests are processed in pass mode.
When an object should not be cached, the beresp.uncacheable
variable is set to true.
As a result, the cacher process keeps a hash reference to the hit-for-pass object.
In this way, the lookup operation for requests translating to that hash find a hit-for-pass object.
Such requests are handed over to the vcl_pass
subroutine, and proceed in pass mode.
As any other cached object, hit-for-pass objects have a TTL. Once the object’s TTL has elapsed, the object is removed from the cache.
VCL – vcl_backend_fetch
¶
sub vcl_backend_fetch {
return (fetch);
}
vcl_backend_fetch
can be called from vcl_miss
or vcl_pass
.
When vcl_backend_fetch
is called from vcl_miss
, the fetched object may be cached.
If vcl_backend_fetch
is called from vcl_pass
, the fetched object is not cached even if obj.ttl
or obj.keep
variables are greater than zero.
A relevant variable is bereq.uncacheable
.
This variable indicates whether the object requested from the backend may be cached or not.
However, all objects from pass requests are never cached, regardless the bereq.uncacheable
variable.
vcl_backend_fetch
has two possible terminating actions, fetch or abandon.
The fetch action sends the request to the backend, whereas the abandon action calls the vcl_synth
subroutine.
The built-in vcl_backend_fetch
subroutine simply returns the fetch
action.
The backend response is processed by vcl_backend_response
or vcl_backend_error
depending on the response from the server.
If Varnish receives a syntactically correct HTTP response, Varnish pass control to vcl_backend_response
.
Syntactically correct HTTP responses include HTTP 5xx
error codes.
If Varnish does not receive a HTTP response, it passes control to vcl_backend_error
.
VCL – vcl_hash
¶
- Defines what is unique about a request.
vcl_hash
is always visited aftervcl_recv
or when another subroutine returns thehash
action keyword.
sub vcl_hash {
hash_data(req.url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
return (lookup);
}
vcl_hash
defines the hash key to be used for a cached object.
Hash keys differentiate one cached object from another.
The default VCL for vcl_hash
adds the hostname or IP address, and the requested URL to the cache hash.
One usage of vcl_hash
is to add a user-name in the cache hash to identify user-specific data.
However, be warned that caching user-data should only be done cautiously.
A better alternative might be to hash cache objects per session instead.
The vcl_hash
subroutine returns the lookup
action keyword.
Unlike other action keywords, lookup
is an operation, not a subroutine.
The next state to visit after vcl_hash
depends on what lookup
finds in the cache.
When the lookup operation does not match any hash, it creates an object with a busy flag and inserts it in cache.
Then, the request is sent to the vcl_miss
subroutine.
The busy flag is removed once the request is handled, and the object is updated with the response from the backend.
Subsequent similar requests that hit busy flagged objects are sent into a waiting list. This waiting list is designed to improve response performance, and it is explain the Waiting State section.
Note
One cache hash may refer to one or many object variations. Object variations are created based on the Vary header field. It is a good practice to keep several variations under one cache hash, than creating one hash per variation.
VCL – vcl_hit
¶
- Executed after the lookup operation, called by
vcl_hash
, finds (hits) an object in the cache.
sub vcl_hit {
if (obj.ttl >= 0s) {
// A pure unadultered hit, deliver it
return (deliver);
}
if (obj.ttl + obj.grace > 0s) {
// Object is in grace, deliver it
// Automatically triggers a background fetch
return (deliver);
}
// fetch & deliver once we get the result
return (fetch);
}
The vcl_hit
subroutine typically terminate by calling return()
with one of the following keywords:
deliver
, restart
, or synth
.
deliver
returns control to vcl_deliver
if the TTL + grace time
of an object has not elapsed.
If the elapsed time is more than the TTL
, but less than the TTL + grace time
, then deliver
calls for background fetch in parallel to vcl_deliver
.
The background fetch is an asynchronous call that inserts a fresher requested object in the cache.
Grace time is explained in the Grace Mode section.
restart
restarts the transaction, and increases the restart counter.
If the number of restarts is higher than max_restarts
counter, Varnish emits a guru meditation error.
synth(status code, reason)
returns the specified status code to the client and abandon the request.
VCL – vcl_miss
¶
- Subroutine called if a requested object is not found by the lookup operation.
- Contains policies to decide whether or not to attempt to retrieve the document from the backend, and which backend to use.
sub vcl_miss {
return (fetch);
}
vcl_hit
and vcl_miss
are closely related.
It is rare that you customize them, because modification of HTTP request headers is typically done in vcl_recv
.
However, if you do not wish to send the X-Varnish
header to the backend server, you can remove it in vcl_miss
or vcl_pass
.
For that case, you can use unset bereq.http.x-varnish;
.VCL – vcl_deliver
¶
- Common last exit point for all request workflows, except requests through
vcl_pipe
- Often used to add and remove debug-headers
sub vcl_deliver {
return (deliver);
}
The vcl_deliver
subroutine is simple, and it is also very useful to modify the output of Varnish.
If you need to remove a header, or add one that is not supposed to be stored in the cache, vcl_deliver
is the place to do it.
The variables most useful and common to modify in vcl_deliver
are:
resp.http.*
- Headers that are sent to the client. They can be set and unset.
resp.status
- The status code (200, 404, 503, etc).
resp.reason
- The HTTP status message that is returned to the client.
obj.hits
- The count of cache-hits on this object.
Therefore, a value of
0
indicates a miss. This variable can be evaluated to easily reveal whether a response comes from a cache hit or miss. req.restarts
- The number of restarts issued in VCL - 0 if none were made.
VCL – vcl_synth
¶
- Used to generate content within Varnish
- Error messages can be created here
- Other use cases: redirecting users (301/302 redirects)
vcl/default-vcl_synth.vcl:
sub vcl_synth {
set resp.http.Content-Type = "text/html; charset=utf-8";
set resp.http.Retry-After = "5";
synthetic( {"<!DOCTYPE html>
<html>
<head>
<title>"} + resp.status + " " + resp.reason + {"</title>
</head>
<body>
<h1>Error "} + resp.status + " " + resp.reason + {"</h1>
<p>"} + resp.reason + {"</p>
<h3>Guru Meditation:</h3>
<p>XID: "} + req.xid + {"</p>
<hr>
<p>Varnish cache server</p>
</body>
</html>
"} );
return (deliver);
}
You can create synthetic responses, e.g., personalized error messages, in vcl_synth
.
To call this subroutine you do:
return (synth(status_code, "reason"));
Note that synth
is not a keyword, but a function with arguments.
You must explicitly return the status code
and reason
arguments for vcl_synth
.
Setting headers on synthetic response bodies are done on resp.http
.
Note
From vcl/default-vcl_synth.vcl
, note that {"
and "}
can be used to make multi-line strings.
This is not limited to the synthetic()
function, but one can be used anywhere.
Note
A vcl_synth
defined object is never stored in cache, contrary to a vcl_backend_error
defined object, which may end up in cache.
vcl_synth
and vcl_backend_error
replace vcl_error
from Varnish 3.
Example: Redirecting requests with vcl_synth
¶
sub vcl_recv {
if (req.http.host == "www.example.com") {
set req.http.location = "http://example.com" + req.url;
return (synth(750, "Permanently moved"));
}
}
sub vcl_synth {
if (resp.status == 750) {
set resp.http.location = req.http.location;
set resp.status = 301;
return (deliver);
}
}
Redirecting with VCL is fairly easy – and fast. Basic HTTP redirects work when the HTTP response is either 301 Moved Permanently or 302 Found. These response have a Location header field telling the web browser where to redirect.
Note
The 301 response can affect how browsers prioritize history and how search engines treat the content. 302 responses are temporary and do not affect search engines as 301 responses do.
Exercise: Modify the HTTP response header fields¶
- Add a header field holding the string
HIT
if the requested resource was found in cache, orMISS
otherwise - “Rename” the
Age
header field toX-Age
Exercise: Change the error message¶
- Make the default error message more friendly.