Varnish: Dealing with web analytics

If you are running Varnish as cache server in a high traffic website, you are probably using web analytics software like Google Analytics for measuring user behaviour and engagement.

The problem is: you are receiving a lot of requests with unique query string parameters, such as fbclid, gclid, etc and directly impacting on your website performance. By default, Varnish will assume that a new copy of that page is necessary.

For example:

GET /products/product1.html
GET /products/product1?fbcid=id1
GET /products/product1?gclid=id2

The URLs above are corresponding to the same product, but because of those unique query string parameters, Varnish is calling to the backend for every single parameter value. Even when the first request is enough.

Fig 1. Varnish will return `MISS` responses, producing an unnecessary high-load in our webserver.
Fig 1. Varnish will return `X-Varnish-Cache: MISS`, producing an unnecessary high-load in our webserver.

This behaviour is called Cache Content Duplication (CDD). We can fix this by stripping those parameters using regsuball and regsub expressions and saving only the content of the base URL, without breaking the integration.

sub vcl_recv {
    set req.url = regsuball(req.url, "&(fbclid|utm_source|utm_term|utm_medium|utm_campaign|gbraid|utm_content|gclid|mc|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
    set req.url = regsuball(req.url, "\?(fbclid|utm_source|utm_term|utm_medium|utm_campaign|gbraid|utm_content|gclid|mc|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
    set req.url = regsub(req.url, "\?&", "?");
    set req.url = regsub(req.url, "\?$", "");
}

Now, if we inspect the traffic using varnishlog, we can see how the parameters were stripped correctly.

$ varnishlog -g request -i requrl
*   << Request  >> 32770
-   ReqURL         /?fbclid=IwAR22zu9EPpIbzZo7CrJeXS
-   ReqURL         /?fbclid=IwAR22zu9EPpIbzZo7CrJeXS
-   ReqURL         /?
-   ReqURL         /?
-   ReqURL         /

*   << Request  >> 5
-   ReqURL         /?a=1&fbclid=IwAR22zu9EPpIbzZo7CrJeXS
-   ReqURL         /?a=1
-   ReqURL         /?a=1
-   ReqURL         /?a=1
-   ReqURL         /?a=1
**  << BeReq    >> 6