VCL Basics¶
In this chapter, you will learn the following topics:
- The Varnish Configuration Language (VCL) is a domain-specific language
- VCL as a finite state machine
- States as subroutines
- Varnish includes built-in subroutines
- Available functions, legal return actions and variables
The Varnish Configuration Language (VCL) is a domain-specific language designed to describe request handling and document caching policies for Varnish Cache.
When a new configuration is loaded, the VCC process, created by the Manager process, translates the VCL code to C.
This C code is compiled typically by gcc
to a shared object.
The shared object is then loaded into the cacher process.
This chapter focuses on the most important tasks to write effective VCL code.
For this, you will learn the basic syntax of VCL, and the most important VCL built-in subroutines: vcl_recv
and vcl_backend_response
.
All other built-in subroutines are taught in the next chapter.
Tip
Remember that Varnish has many reference manuals.
For more details about VCL, check its manual page by issuing man vcl
.
Varnish Finite State Machine¶
- VCL workflow seen as a finite state machine – See Fig. 23 in the book
- States are conceptualized and implemented as subroutines, e.g.,
sub vcl_recv
- Built-in subroutines start with
vcl_
, which is a reserved prefix return (action)
terminates subroutines, whereaction
is a keyword that indicates the next step to do
Snippet from vcl_recv subroutine
sub vcl_recv {
if (req.method != "GET" && req.method != "HEAD") {
return (pass);
}
return (hash);
}
VCL is often described as a finite state machine.
Each state has available certain parameters that you can use in your VCL code.
For example: response HTTP headers are only available after vcl_backend_fetch
state.
Fig. 23 shows a simplified version of the Varnish finite state machine. This version shows by no means all possible transitions, but only a typical set of them. Fig. 24 and Fig. 25 show the detailed version of the state machine for the frontend and backend worker respectively.
States in VCL are conceptualized as subroutines, with the exception of the waiting state described in Waiting State
Subroutines in VCL take neither arguments nor return values.
Each subroutine terminates by calling return (action)
, where action
is a keyword that indicates the desired outcome.
Subroutines may inspect and manipulate HTTP header fields and various other aspects of each request.
Subroutines instruct how requests are handled.
Subroutine example:
sub pipe_if_local {
if (client.ip ~ local) {
return (pipe);
}
}
To call a subroutine, use the call
keyword followed by the subroutine’s name:
call pipe_if_local;
Varnish has built-in subroutines that are hook into the Varnish workflow.
These built-in subroutines are all named vcl_*
.
Your own subroutines cannot start their name with vcl_
.
Detailed Varnish Request Flow for the Client Worker Thread¶
The VCL Finite State Machine¶
- Each request is processed separately
- Each request is independent from others at any given time
- States are related, but isolated
return(action);
exits the current state and instructs Varnish to proceed to the next state- Built-in VCL code is always present and appended below your own VCL
Before we begin looking at VCL code, we should learn the fundamental concepts behind VCL. When Varnish processes a request, it starts by parsing the request itself. Later, Varnish separates the request method from headers, verifying that it is a valid HTTP request and so on. When the basic parsing has completed, the very first policies are checked to make decisions.
Policies are a set of rules that the VCL code uses to make a decision.
Policies help to answer questions such as: should Varnish even attempt to find the requested resource in the cache?
In this example, the policies are in the vcl_recv
subroutine.
Warning
If you define your own subroutine and execute return (action);
in it, control is passed to the Varnish Run Time (VRT) environment.
In other words, your return (action);
skips the built-it subroutine.
VCL Syntax¶
- VCL files start with
vcl 4.0;
- //, # and /* foo */ for comments
- Subroutines are declared with the
sub
keyword - No loops, state-limited variables
- Terminating statements with a keyword for next action as argument of the
return()
function, i.e.:return(action)
- Domain-specific
include "foo.vcl";
to include a VCL fileimport foo;
to load Varnish modules (VMODs)
Starting with Varnish 4.0, each VCL file must start by declaring its version with a special vcl 4.0;
marker at the top of the file.
If you have worked with a programming language or two before, the basic syntax of Varnish should be reasonably straightforward.
VCL is inspired mainly by C and Perl.
Blocks are delimited by curly braces, statements end with semicolons, and comments may be written as in C, C++ or Perl according to your own preferences.
Subroutines in VCL neither take arguments, nor return values. Subroutines in VCL can exchange data only through HTTP headers.
VCL has terminating statements, not traditional return values.
Subroutines end execution when a return(*action*)
statement is made.
The action tells Varnish what to do next.
For example, “look this up in cache”, “do not look this up in the cache”, or “generate an error message”.
To check which actions are available at a given built-in subroutine, see the Legal Return Actions section or see the manual page of VCL.
VCL has two directives to use contents from another file.
These directives are include
and import
, and they are used for different purpose.
include
is used to insert VCL code from another file.
Varnish looks for files to include in the directory specified by the vcl_dir
parameter of varnishd
.
Note the quotation marks in the include
syntax.
import
is used to load VMODs and make available their functions into your VCL code.
Varnish looks for VMODs to load in the directory specified by the vmod_dir
parameter of varnishd
.
Note the lack of quotation marks in the import
syntax.
You can use the include
and import
in varnishtest
.
To learn more on how to test your VCL code in a VTC, refer to the subsection VCL in varnishtest.
VCL Built-in Functions and Keywords¶
Functions:
regsub(str, regex, sub)
regsuball(str, regex, sub)
ban(boolean expression)
hash_data(input)
synthetic(str)
Keywords:
call subroutine
return(action)
new
set
unset
All functions are available in all subroutines, except the listed in the table below.
Function | Subroutines |
---|---|
hash_data | vcl_hash |
new | vcl_init |
synthetic | vcl_synth, vcl_backend_error |
VCL offers many simple to use built-in functions that allow you to modify strings, add bans, restart the VCL state engine and return control to the Varnish Run Time (VRT) environment. This book describes the most important functions in later sections, so the description at this point is brief.
regsub()
and regsuball()
take a string str
as input, search it with a regular-expression regex
and replace it with another string.
regsub()
changes only the first match, and `regsuball()
changes all occurrences.
The ban(boolean expression)
function invalidates all objects in cache that match the boolean expression.
banning and purging in detailed in the Cache Invalidation chapter.
Legal Return Actions¶
subroutine | scope | deliver | fetch | restart | hash | pass | pipe | synth | purge | lookup |
---|---|---|---|---|---|---|---|---|---|---|
vcl_deliver | client | x | x | x | ||||||
vcl_hash | client | x | ||||||||
vcl_hit | client | x | x | x | x | x | ||||
vcl_miss | client | x | x | x | x | |||||
vcl_pass | client | x | x | x | ||||||
vcl_pipe | client | x | x | |||||||
vcl_purge | client | x | x | |||||||
vcl_recv | client | x | x | x | x | x | ||||
vcl_synth | client | x | x |
subroutine | scope | fetch | deliver | abandon | retry | ok | fail |
---|---|---|---|---|---|---|---|
vcl_backend_fetch | backend | x | x | ||||
vcl_backend_response | backend | x | x | x | |||
vcl_backend_error | backend | x | x | ||||
vcl_init | vcl.load | x | x | ||||
vcl_fini | vcl.discard | x |
The table above shows the VCL built-in subroutines and their legal returns.
return
is a built-in keyword that ends execution of the current VCL subroutine and continue to the next action
step in the request handling state machine.
Legal return actions are: lookup, synth, purge, pass, pipe, fetch, deliver, hash, restart, retry, and abandon.
Note
In Varnish 4 purge
is used as a return action.
Variables in VCL subroutines¶
subroutine | req. | bereq. | beresp. | obj. | resp. |
---|---|---|---|---|---|
vcl_backend_fetch | R/W | ||||
vcl_backend_response | R/W | R/W | |||
vcl_backend_error | R/W | R/W | |||
vcl_recv | R/W | ||||
vcl_pipe | R | R/W | |||
vcl_pass | R/W | ||||
vcl_hash | R/W | ||||
vcl_purge | R/W | ||||
vcl_miss | R/W | ||||
vcl_hit | R/W | R | |||
vcl_deliver | R/W | R | R/W | ||
vcl_synth | R/W | R/W |
Table 16 shows the availability of variables in each VCL subroutine and whether the variables are readable (R) or writable (W).
The variables in this table are listed per subroutine and follow the prefix req.
, bereq.
, beresp.
, obj.
, or resp.
.
However, predefined variables does not strictly follow the table, for example, req.restarts
is readable but not writable.
In order to see the exact description of predefined variables, consult the VCL man page or ask your instructor.
Most variables are self-explanatory but not how they influence each other, thus a brief explanation follows:
Values of request (req.
) variables are automatically assigned to backend request (bereq.
) variables.
However, those values may slightly differ, because Varnish may modify client requests.
For example, HEAD
requests coming from clients may be converted to GET
requests towards the backend.
Changes in backend response (beresp.
) variables affect response (resp.
) and object (obj.
) variables.
Many of the obj.
variables are set in resp.
, which are to be sent to the clients.
Additional variable prefixes from Table 16 are; client.
, server.
, local
, remote
, and storage.
.
These prefixes are accessible from the subroutines at the frontend (client) side.
Yet another variable is now
, which is accessible from all subroutines.
Support for global variables with a lifespan across transactions and VCLs is achieved with the variable VMOD. This VMOD keeps the variables and its values as long as the VMOD is loaded. Supported data types are strings, integers and real numbers. For more information about the variable VMOD, please visit https://github.com/varnish/varnish-modules/blob/master/docs/vmod_var.rst.
Note
Recall that every transaction in Varnish is always in a state, and each state is represented by its correspondent subroutine.
Built-in vcl_recv
¶
sub vcl_recv {
if (req.method == "PRI") {
/* We do not support SPDY or HTTP/2.0 */
return (synth(405));
}
if (req.method != "GET" &&
req.method != "HEAD" &&
req.method != "PUT" &&
req.method != "POST" &&
req.method != "TRACE" &&
req.method != "OPTIONS" &&
req.method != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
if (req.http.Authorization || req.http.Cookie) {
/* Not cacheable by default */
return (pass);
}
return (hash);
}
- We will revisit
vcl_recv
after we learn more about built-in functions, keywords, variables and return actions
The built-in VCL for vcl_recv
is designed to ensure a safe caching policy even with no modifications in VCL.
It has two main uses:
- Only handle recognized HTTP methods.
- Cache requests with
GET
andHEAD
headers.
Policies for no caching data are to be defined in your VCL.
Built-in VCL code is executed right after any user-defined VCL code, and is always present.
You can not remove built-in subroutines, however, you can avoid them if your VCL code reaches one of the terminating actions: pass
, pipe
, hash
, or synth
.
These terminating actions return control from the VRT (Varnish Run-Time) to Varnish.
For a well-behaving Varnish server, most of the logic in the built-in VCL is needed. Consider either replicating all the built-in VCL logic in your own VCL code, or let your client requests be handled by the built-in VCL code.
We will revisit and discuss in more detail the vcl_recv
subroutine in VCL Built-in Subroutines
, but before, let’s learn more about built-in functions, keywords, variables and return actions
Exercise: Configure vcl_recv
to avoid caching all requests to the URL /admin
¶
- Find and open the
built-in.vcl
code, and analyze thevcl_recv
subroutine - Create your VCL code to avoid caching all URLs under
/admin
- Compile your VCL code to C language, and analyze how the
built-in.vcl
code is appended
Detailed Varnish Request Flow for the Backend Worker Thread¶
- See Fig. 25 in the book
- Review of return actions:
fetch
,deliver
,retry
andabandon
Fig. 25 shows the vcl_backend_fetch
, vcl_backend_response
and vcl_backend_error
subroutines.
These subroutines are the backend-counterparts to vcl_recv
.
You can use data provided by the client in vcl_recv
or even vcl_backend_fetch
to define your caching policy.
An important difference is that you have access to bereq.*
variables in vcl_backend_fetch
.
As detailed in Legal Return Actions, vcl_backend_fetch
can return fetch
or abandon
, vcl_backend_response
can return deliver
, retry
or abandon
, and vcl_backend_error
can return deliver
or retry
.
The fetch
action transmits the request to the backend.
The abandon
action discards any possible response from the backend.
The deliver
action builds a response with the response from the backend and send it to the client.
An important difference between deliver
and abandon
is that deliver
stores the response in the cache, whereas abandon
does not.
You can leverage this difference with stale objects.
For example, in 5xx
server error, you might want to build a response with an stale object instead of sending the error to the client.
The retry
action re-enters vcl_backend_fetch
as further detailed in retry Return Action.
The functionality of these return actions are the same for the subroutines where they are valid.
You will learn more about vcl_backend_fetch
in the next chapter, but before we review vcl_backend_response
because the backend response is normally processed there.
VCL – vcl_backend_response
¶
- Override cache time for certain URLs
- Strip
Set-Cookie
header fields that are not needed - Strip buggy
Vary
header fields - Add helper-headers to the object for use in banning (more information in later sections)
- Sanitize server response
- Apply other caching policies
Fig. 25 shows that vcl_backend_response
may terminate with one of the following actions: deliver
, retry
and abandon
.
The deliver
terminating action may or may not insert the object into the cache depending on the response of the backend.
The retry
action makes Varnish to transmit the request to the backend again by calling the vcl_backend_fetch
subroutine.
The abandon
action discards any response from the backend.
Backends might respond with a 304
HTTP headers.
304
responses happen when the requested object has not been modified since the timestamp If-Modified-Since
in the HTTP header.
If the request hits a non fresh object (see Fig. 2), Varnish adds the If-Modified-Since
header with the value of t_origin
to the request and sends it to the backend.
304
responses do not contain a message body.
Thus, Varnish builds the response using the body from cache.
304
responses update the attributes of the cached object.
vcl_backend_response
¶
built-in vcl_backend_response
sub vcl_backend_response {
if (beresp.ttl <= 0s ||
beresp.http.Set-Cookie ||
beresp.http.Surrogate-control ~ "no-store" ||
(!beresp.http.Surrogate-Control &&
beresp.http.Cache-Control ~ "no-cache|no-store|private") ||
beresp.http.Vary == "*") {
/*
* Mark as "Hit-For-Pass" for the next 2 minutes
*/
set beresp.ttl = 120s;
set beresp.uncacheable = true;
}
return (deliver);
}
The vcl_backend_response
built-in subroutine is designed to avoid caching in conditions that are most probably undesired.
For example, it avoids caching responses with cookies, i.e., responses with Set-Cookie
HTTP header field.
This built-in subroutine also avoids request serialization described in the Waiting State section.
To avoid request serialization, beresp.uncacheable
is set to true
, which in turn creates a hit-for-pass
object.
The hit-for-pass section explains in detail this object type.
If you still decide to skip the built-in vcl_backend_response
subroutine by having your own and returning deliver
, be sure to never set beresp.ttl
to 0
.
If you skip the built-in subroutine and set 0
as TTL value, you are effectively removing objects from cache that could eventually be used to avoid request serialization.
Note
Varnish 3.x has a hit_for_pass return action.
In Varnish 4, this action is achieved by setting beresp.uncacheable
to true
.
The hit-for-pass section explains this in more detail.
The Initial Value of beresp.ttl
¶
Before Varnish runs vcl_backend_response
, the beresp.ttl
variable has already been set to a value.
beresp.ttl
is initialized with the first value it finds among:
- The
s-maxage
variable in theCache-Control
response header field - The
max-age
variable in theCache-Control
response header field - The
Expires
response header field - The
default_ttl
parameter.
Only the following status codes will be cached by default:
- 200: OK
- 203: Non-Authoritative Information
- 300: Multiple Choices
- 301: Moved Permanently
- 302: Moved Temporarily
- 304: Not modified
- 307: Temporary Redirect
- 410: Gone
- 404: Not Found
You can cache other status codes than the ones listed above, but you have to set the beresp.ttl
to a positive value in vcl_backend_response
.
Since beresp.ttl
is set before vcl_backend_response
is executed, you can modify the directives of the Cache-Control
header field without affecting beresp.ttl
, and vice versa.
Cache-Control
directives are defined in RFC7234 Section 5.2.
A backend response may include the response header field of maximum age for shared caches s-maxage
.
This field overrides all max-age
values throughout all Varnish servers in a multiple Varnish-server setup.
For example, if the backend sends Cache-Control: max-age=300, s-maxage=3600
, all Varnish installations will cache objects with an Age
value less or equal to 3600 seconds.
This also means that responses with Age
values between 301 and 3600 seconds are not cached by the clients’ web browser, because Age
is greater than max-age
.
A sensible approach is to use the s-maxage
directive to instruct Varnish to cache the response.
Then, remove the s-maxage
directive using regsub()
in vcl_backend_response
before delivering the response.
In this way, you can safely use s-maxage
as the cache duration for Varnish servers, and set max-age
as the cache duration for clients.
Warning
Bear in mind that removing or altering the Age
response header field may affect how responses are handled downstream.
The impact of removing the Age
field depends on the HTTP implementation of downstream intermediaries or clients.
For example, imagine that you have a three Varnish-server serial setup.
If you remove the Age
field in the first Varnish server, then the second Varnish server will assume Age=0
.
In this case, you might inadvertently be delivering stale objects to your client.
Example: Setting TTL of .jpg URLs to 60 seconds¶
sub vcl_backend_response {
if (bereq.url ~ "\.jpg$") {
set beresp.ttl = 60s;
}
}
Set-Cookie
field are not cached.Example: Cache .jpg for 60 seconds only if s-maxage
is not present¶
sub vcl_backend_response {
if (beresp.http.cache-control !~ "s-maxage" && bereq.url ~ "\.jpg$") {
set beresp.ttl = 60s;
}
}
The purpose of the above example is to allow a gradual migration to using a backend-controlled caching policy.
If the backend does not supply s-maxage
, and the URL is a jpg file, then Varnish sets beresp.ttl
to 60 seconds.
The Cache-Control
response header field can contain a number of directives.
Varnish parses this field and looks for s-maxage
and max-age
.
By default, Varnish sets beresp.ttl
to the value of s-maxage
if found.
If s-maxage
is not found, Varnish uses the value max-age
.
If neither exists, Varnish uses the Expires
response header field to set the TTL.
If none of those header fields exist, Varnish uses the default TTL, which is 120 seconds.
The default parsing and TTL assignment are done before vcl_backend_response
is executed.
The TTL changing process is recorded in the TTL
tag of varnishlog
.
Exercise: Avoid Caching a Page¶
- Write a VCL which avoids caching the index page at all
- Your VCL should cover both resource targets: / and /index.html
When trying this out, remember that Varnish keeps the Host
header field in req.http.host
and the requested resource in req.url
.
For example, in a request to http://www.example.com/index.html, the http:// part is not seen by Varnish at all, req.http.host
has the value www.example.com and req.url
the value /index.html.
Note how the leading /
is included in req.url
.
If you need help, see Solution: Avoid caching a page.
Exercise: Either use s-maxage or set TTL by file type¶
Write a VCL that:
- uses
Cache-Control: s-maxage
when present, - caches
.jpg
for 30 seconds ifs-maxage
is not present, - caches
.html
for 10 seconds ifs-maxage
isn’t present, and - removes the
Set-Cookie
header field ifs-maxage
or the above rules indicate that Varnish should cache.
If you need help, see Solution: Either use s-maxage or set TTL by file type.
Tip
Divide and conquer! Most somewhat complex VCL tasks are easily solved when you divide the tasks into smaller problems and solve them individually. Try solving each part of the exercise by itself first.
Note
Varnish automatically parses s-maxage
for you, so you only need to check if it is there or not.
Remember that if s-maxage
is present, Varnish has already used it to set beresp.ttl
.
Waiting State¶
- Request serialization is a non desired side-effect that is handled in the vcl_backend_response subroutine
- Designed to improve response performance
The waiting state is reached when a request n arrives while a previous identical request 0 is being handled at the backend. In this case, request 0 is set as busy and all subsequent requests n are queued in a waiting list. If the fetched object from request 0 is cacheable, all n requests in the waiting list call the lookup operation again. This retry will hopefully hit the desired object in cache. As a result, only one request is sent to the backend.
The waiting state is designed to improve response performance. However, a counterproductive scenario, namely request serialization, may occur if the fetched object is uncacheable, and so is recursively the next request in the waiting list. This situation forces every single request in the waiting list to be sent to the backend in a serial manner. Serialized requests should be avoided because their performance is normally poorer than sending multiple requests in parallel. The built-in vcl_backend_response subroutine avoids request serialization.
Summary of VCL Basics¶
- VCL is all about policies
- Built-in VCL subroutines map the Varnish finite state machine
- Each request is handled independently
- Recommendation: Building a VCL file is done one line at a time
VCL provides subroutines that allow you to affect the handling of any single request almost anywhere in the execution chain. This provides pros and cons as any other programming language.
This book is not a complete reference guide to how you can deal with every possible scenario in VCL, but if you master the basics of VCL you can solve complex problems that nobody has thought about before. And you can usually do it without requiring too many different sources of documentation.
Whenever you are working on VCL, you should think of what that exact line you are writing has to do. The best VCL is built by having many independent sections that do not interfere with each other more than what they have to.
Remember that there is a built-in VCL.
If your own VCL code does not reach a return statement, the built-in VCL subroutine is executed after yours.
If you just need a little modification of a subroutine, you can use the code from {varnish-source-code}/bin/varnishd/builtin.vcl
as a template.