Varnish: dealing with web analytics
April 9, 2024 #varnishIf you are running Varnish as cache server in a high traffic website, you are probably using web analytics software like Google Analytics for measuring user behaviour and engagement.
The problem is: you are receiving a lot of requests with unique query string parameters, such as fbclid, gclid, etc and directly impacting on your website performance. By default, Varnish will assume that a new copy of that page is necessary.
For example:
|
|
The URLs above are corresponding to the same product, but because of those unique query string parameters, Varnish is calling to the backend for every single parameter value. Even when the first request is enough.
This behaviour is called Cache Content Duplication (CDD)
. We can fix this by stripping those parameters using regsuball and regsub expressions and saving only the content of the base URL, without breaking the integration.
|
|
Now, if we inspect the traffic using varnishlog, we can see how the parameters were stripped correctly.
|
|