Content Composition¶
This chapter is for the web-developer course only
This chapter teaches you how to glue content from independent sources into one web page.
- Cookies and how to work with them
- Edge Side Includes (ESI) and how to compose a single client-visible page out of multiple objects
- Combining ESI and Cookies
- AJAX and masquerading AJAX requests through Varnish
A Typical Website¶
Most websites follow a pattern: they have easily distinguishable parts:
- A front page
- Articles or sub-pages
- A login-box or “home bar”
- Static elements, like CSS, JavaScript and graphics
To truly utilize Varnish to its full potential, start by analyzing the structure of the website. Ask yourself this:
- What makes web pages in your server different from each other?
- Does the differences apply to entire pages, or only parts of them?
- How can I let Varnish to know those differences?
Beginning with the static elements should be easy. Previous chapters of this book cover how to handle static elements. How to proceed with dynamic content?
An easy solution is to only cache content for users that are not logged in. For news-papers, that is probably enough, but not for web-shops.
Web-shops re-use objects frequently. If you can isolate the user-specific bits, like the shopping cart, you can cache the rest. You can even cache the shopping cart, if you tell Varnish when to change it.
The most important lessons is to start with what you know.
Cookies¶
- Be careful when caching cookies!
- Cookies are frequently used to identify unique users, or user’s choices.
- They can be used for anything from identifying a user-session in a web-shop to opting for a mobile version of a web page.
- Varnish can handle cookies coming from two different sources:
req.http.Cookie
header field from clientsberesp.http.Set-Cookie
header field from servers
By default Varnish does not cache a page if req.http.Cookie
or beresp.http.Set-Cookie
are present.
This is for two main reasons:
1) to avoid littering the cache with large amount of copies of the same content, and
2) to avoid delivering cookie-based content to a wrong client.
It is far better to either cache multiple copies of the same content for each user or cache nothing at all, than caching personal, confidential or private content and deliver it to a wrong client. In other words, the worst is to jeopardize users’ privacy for saving backend resources. Therefore, it is strongly advised to take your time to write a correct VCL program and test it thoroughly before caching cookies in production deployments.
Despite cookie-based caching being discouraged, Varnish can be forced to cache content based on cookies.
If a client request contains req.http.Cookie
, use return (hash);
in vcl_recv
.
If the cookie is a Set-Cookie
HTTP response header field from the server, use return (deliver);
in vcl_backend_response
.
Note
If you need to handle cookies, consider using the cookie
VMOD from https://github.com/varnish/varnish-modules/blob/master/docs/vmod_cookie.rst.
This VMOD handles cookies with convenient parsing and formatting functions without the need of regular-expressions.
Vary
and Cookies¶
- Used to cache content that varies on cookies
- By default, Varnish does not store responses when cookies are involved
- The
Vary
response header field can be used to store responses that are based on the value of cookies - Cookies are widely used, but not
Vary: Cookie
Varnish uses a different hash value for each cached resource.
Resources with several representations, i.e. variations containing the Vary
response header field, share the same hash value in Varnish.
Despite this common hash value, caching based on the Vary: Cookie
response header is not advised, because of its poor performance.
For a more detailed explanation on Vary
, please refer to the Vary subsection.
Note
Consider using Edge Side Includes to let Varnish build responses that combine content with and without cookies, i.e. combining caches and responses from the origin server.
Best Practices for Cookies¶
Remove all cookies that you do not need
Organize the content of your web site in a way that let you easily determine if a page needs a cookie or not. For example:
/common/
– no cookies/user/
– has user-cookies/voucher/
– has only the voucher-cookie- etc.
Add the
req.http.Cookie
request header to the cache hash by issuinghash_data(req.http.cookie);
invcl_hash
.Never cache a
Set-Cookie
header. Either remove the header before caching or do not cache the object at all.To ensure that all cached pages are stripped of
Set-Cookie
, finishvcl_backend_response
with something similar to:if (beresp.ttl > 0s) { unset beresp.http.Set-cookie; }
Exercise: Handle Cookies with Vary
and hash_data
with HTTPie¶
In this exercise you have to use two cache techniques; first Vary
and then hash_data()
.
The exercise uses the Cookie
header field, but the same rules apply to any other field.
For that, prepare the testbed and test with HTTPie:
Copy the file
material/webdev/cookies.php
to/var/www/html/cookies.php
.Send different requests in HTTPie changing
/cookies.php
anduser=Alice
for/article.html
anduser=Bob
, e.g.:http http://localhost/cookies.php "Cookie: user=Alice"
Vary: Part 1:
- Write a VCL program to force Varnish to cache client requests with cookies.
- Send two client requests for the same URL; one for user Alice and one for user Bob.
- Does Varnish use different backend responses to build and deliver the response to the client?
- Make
cookies.php
send theVary: Cookie
response header field, then analyze the response to the client. - Remove
beresp.http.Vary
in vcl_backend_response and see if Varnish still honors theVary
header.
Vary: Part 2:
- Purge the cached object for resource
/cookies.php
. - Check if it affects all, none or just one of the objects in cache (e.g: change the value of the cookie and see if the
PURGE
method has purged all of them).
hash_data(): Part 1:
- Write another VCL program or add conditions to differentiate requests handled by
Vary
andhash_data()
. - Add
hash_data(req.http.Cookie);
in vcl_hash. - Check how multiple values of
Cookie
give individual cached objects.
hash_data(): Part 2:
- Purge the cache again and check the result after using
hash_data()
instead ofVary: Cookie
.
Vary
and hash mechanisms.
These mechanisms can also be tested and learned through varnishtest
.
If you have time and are curious enough, please do the Exercise: Handle Cookies with Vary and hash_data() in varnishtest.
After solving these exercises, you will understand very well how Vary
and hash_data();
work.Edge Side Includes¶
- What is ESI?
- How to use ESI?
- Testing ESI without Varnish
- ESI has a linear growth complexity
- Serial ESI available in Varnish Cache
- Parallel ESI in Varnish Plus only
Edge Side Includes or ESI is a small markup language for dynamic web page assembly at the reverse proxy level. The reverse proxy analyses the HTML code, parses ESI specific markup and assembles the final result before flushing it to the client. Fig. 27 depicts this process.
With ESI, Varnish can be used not only to deliver objects, but to glue them together. The most typical use case for ESI is a news article with a most recent news box at the side. The article itself is most likely written once and possibly never changed, and can be cached for a long time. The box at the side with most recent news, however, changes frequently. With ESI, the article can include a most recent news box with a different TTL.
When using ESI, Varnish fetches the news article from a web server, then parses the <esi:include src="/url" />
ESI tag, and fetches the URL via a normal request.
Either finding it already cached or getting it from a web server and inserting it into cache.
The TTL of the ESI element can be 5 minutes while the article is cached for two days. Varnish delivers the two different objects in one glued page. Thus, Varnish updates parts independently and makes possible to combine content with different TTL.
Basic ESI usage¶
Enabling ESI in Varnish is simple enough:
sub vcl_backend_response {
set beresp.do_esi = true;
}
To include a page in another, the <esi:include>
ESI tag is used:
<esi:include src="/url" />
You can also strip off cookies per ESI element. This is done in vcl_recv.
Varnish only supports three ESI tags:
<esi:include>
: calls the page defined in thesrc
attribute and replaces the ESI tag with the content ofsrc
.<esi:remove>
: removes any code inside this opening and closing tag.<!--esi ``(content)
–>``: Leaves(content)
unparsed. E.g., the following does not process the<esi:include>
tag:<!--esi This ESI tag is not processed: <esi:include src="example"> -->
varnishtest
is a useful tool to understand how ESI works.
The subsection Understanding ESI in varnishtest contains a Varnish Test Case (VTC) using ESI.
Note
Varnish outputs ESI parsing errors in varnishstat
and varnishlog
.
Example: Using ESI¶
Copy material/webdev/esi-date.php
to /var/www/html/
.
This file contains an ESI include tag:
<HTML>
<BODY>
<?php
header( 'Content-Type: text/plain' );
print( "This page is cached for 1 minute.\n" );
echo "Timestamp: \n"
. date("Y-m-d H:i:s");
print( "\n" );
?>
<esi:include src="/cgi-bin/date.cgi"/>
</BODY>
</HTML>
Copy material/webdev/esi-date.cgi
to /usr/lib/cgi-bin/
.
This file is a simple CGI that outputs the date of the server:
#! /bin/sh
echo "Content-Type: text/plain"
echo ""
echo "ESI content is cached for 30 seconds."
echo "Timestamp: "
date "+%Y-%m-%d %H:%M:%S"
For ESI to work, load the following VCL code:
sub vcl_backend_response {
if (bereq.url == "/esi-date.php") {
set beresp.do_esi = true; // Do ESI processing
set beresp.ttl = 1m; // Sets a higher TTL main object
} elsif (bereq.url == "/cgi-bin/esi-date.cgi") {
set beresp.ttl = 30s; // Sets a lower TTL on
// the included object
}
}
Then reload your VCL (see Table 5 for reload instructions) and issue the command http http://localhost/esi-date.php
.
The output should show you how Varnish replaces the ESI tag with the response from esi-date.cgi
.
Note the different TTLs from the glued objects.
Exercise: Enable ESI and Cookies¶
- Use
material/webdev/esi-top.php
andmaterial/webdev/esi-user.php
to test ESI. - Visit
esi-top.php
and identify the ESI tag. - Enable ESI for
esi-top.php
in VCL and test. - Strip all cookies from
esi-top.php
and make it cache. - Let
esi-user.php
cache too. It emitsVary: Cookie
, but might need some help.
See the suggested solutions of Exercise: Handle Cookies with Vary and hash_data() in varnishtest to get an idea on how to solve this exercise.
Try to avoid return (hash);
in vcl_recv
and return (deliver);
in vcl_backend_response
as much as you can.
This is a general rule to make safer Varnish setups.
During the exercise, make sure you understand all the cache mechanisms at play.
You can also try removing the Vary: Cookie
header from esi-user.php
.
You may also want to try PURGE
.
If so, you have to purge each of the objects, because purging just /esi-top.php
does not purge /esi-user.php
.
Testing ESI without Varnish¶
- Test ESI Using JavaScript to fill in the blanks.
During development of different web pages to be ESI-glued by Varnish, you might not need Varnish all the time. One important reason for this, is to avoid caching during the development phase. There is a solution based on JavaScript to interpret ESI syntax without having to use Varnish at all. You can download the library at the following URL:
Once downloaded, extract it in your code base, include esiparser.js
and include the following JavaScript code to trigger the ESI parser:
$(document).ready( function () { do_esi_parsing(document); });
Masquerading AJAX requests¶
This works | This does not work |
Exercise: write a VCL that masquerades XHR calls¶
material/webdev/ajax.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/1.4/jquery.min.js">
</script>
<script type="text/javascript">
function getNonMasqueraded()
{
$("#result").load( "http://www.google.com/robots.txt" );
}
function getMasqueraded()
{
$("#result").load( "/masq/robots.txt" );
}
</script>
</head>
<body>
<h1>Cross-domain Ajax</h1>
<ul>
<li><a href="javascript:getNonMasqueraded();">
Test a non masqueraded cross-domain request
</a></li>
<li><a href="javascript:getMasqueraded();">
Test a masqueraded cross-domain request
</a></li>
</ul>
<h1>Result</h1>
<div id="result"></div>
</body>
</html>
Use the provided ajax.html
page.
Note that function getNonMasqueraded()
fails because the origin is distinct to the google.com
domain.
Function getMasqueraded()
can do the job if a proper VCL code handles it.
Write the VCL code that masquerades the Ajax request to http://www.google.com/robots.txt
.
If you need help, see Solution: Write a VCL that masquerades XHR calls.