Automating tests for the GTM data layer

At Travix we are constantly analyzing application and user behavior on our websites in order to offer the best experience to our customers. One of the tools employed for this purpose is Google Tag Manager (also called GTM), alongside a data layer. A data layer is a JavaScript object that is used to pass information from the website to the Tag Manager container[1], like product views and purchases.

While implementing new features and refactoring parts of the frontend application, we frequently faced an issue: some GTM events would go missing, be duplicated, dispatch at the wrong time, or just lack important dimensions. This can heavily impact our ability to analyze the data, and thus demanded a lot of manual testing to ensure that everything was working as intended.

We already have end-to-end tests in place, doing a lot of user interactions throughout the website, that in turn push events to the data layer. What if we could extend these tests to also check if the information in the data layer is consistent? Since the data layer is basically an array of JavaScript objects, we can do a sort of snapshot testing, comparing the current values with what we expect them to be.

In this article, I will cover how we automated GTM data layer testing using our end-to-end test framework of choice, TestCafe. The same principles can be easily applied to other test frameworks though.

Retrieving the data layer

The data layer is assigned to a global dataLayer variable. To retrieve the data layer items in end-to-end tests, we must execute code in the browser’s context. In TestCafe, this can be done with a ClientFunction:

import { ClientFunction } from 'testcafe'

const getDataLayer = ClientFunction(() => window.dataLayer)

However, if you try to run this function in your tests, you may face the following error:

ClientFunction cannot return DOM elements. Use Selector functions for this purpose.

This happens due to some events like gtm.click containing references to DOM nodes, which cannot be serialized. One way to fix this is to traverse over all items and remove any such references before returning the data. I will leave this as an exercise, mainly because the problem went away once we started filtering out default GTM events, as I will explain in the next section.

Filtering out default GTM events

One thing I noticed after printing the data layer a couple of times was that default GTM events (e.g., gtm.load, gtm.click) would fire at different points in time, thus their order would not be the same and our tests would often fail. To avoid this issue, I decided to simply filter out these default events, since they are not very relevant to us—we care more about the custom events we fire ourselves.

All default GTM events start with gtm, so we can just ignore them with a filter on the event name:

const getDataLayer = ClientFunction(() => window.dataLayer
  .filter(({ event }) => !event.startsWith('gtm'))
)

Comparing the data layer

Now that we have the data layer in hand, we can make assertions on it. Write your reference data layer snapshot and do a deep equality check against it[2]:

import { t } from 'testcafe'

const dataLayerSnapshot = [
  { event: "productClick" },
  { event: "addToCart" },
  { event: "removeFromCart" },
  { event: "promotionClick" },
  { event: "checkout" },
  { event: "checkoutOption" }
]

await t
  .expect(getDataLayer()).eql(dataLayerSnapshot)

If the data layer does not match the snapshot, the test will fail:

AssertionError: expected [ Array(5) ] to deeply equal [ Array(6) ]

Then it is a matter of fixing the code if it is a regression issue, or (manually) updating the snapshot.

Bonus: improving test failure output

You probably noticed that the error message is not very helpful—it does not tell you exactly what the difference is between the expected and the received values.

We can work around this by doing a string comparison instead, stringifying both the data layer and the snapshot before the assertion:

const getDataLayer = ClientFunction(() => JSON.stringify(
  window.dataLayer
    .filter(({ event }) => !event.startsWith('gtm'))
))
const dataLayerSnapshot = JSON.stringify([
  { event: "productClick" },
  { event: "addToCart" },
  { event: "removeFromCart" },
  { event: "promotionClick" },
  { event: "checkout" },
  { event: "checkoutOption" }
])

Not pretty, but it does the job—although it is still a bit difficult to spot what the actual problem is:

AssertionError: expected

'[{"event":"productClick"},{"event":"addToCart"},{"event":"promotionClick"},{"event":"checkout"},{"event":"checkoutOption"}]'
   to deeply equal

'[{"event":"productClick"},{"event":"addToCart"},{"event":"removeFromCart"},{"event":"promotionClick"},{"event":"checkout"},{"event":"checkoutOption"}]'

In a follow-up post I will explain how we managed to improve this even further by using the expect module inside TestCafe for a more Jest-like assertion output.

Conclusion

Manually testing the data layer after each frontend change is a very time consuming process. Inspection tools like dataslayer can help, but they are no match for proper automation. By leveraging the power of end-to-end tests, we can save valuable time from developers and data analysts, while being more confident that changes to the codebase will not negatively impact sales and performance tracking.


  1. See this GTM help center article for more information. ↩︎

  2. For a single-page application (SPA), this could be the very last step of the test. ↩︎

How to add a Netlify deploy status badge to your project

Ever since I moved this blog to Netlify I wanted to add a badge to the repository’s README displaying the deploy status. The Shields.io service doesn’t support Netlify badges yet, but luckily I found out that you can build dynamic badges by querying structured data from any public URL.

After digging into the Netlify REST API, I managed to make a badge that fetches all deploys for my site and extracts the status of the last deploy:

[![Deploy status](https://img.shields.io/badge/dynamic/json.svg?url=https://api.netlify.com/api/v1/sites/rbardini.com/deploys&label=deploy&query=$[0].state&colorB=blue)](https://app.netlify.com/sites/rbardini/deploys)

Which looks like this:

Deploy status

One shortcoming is that you cannot set a different color depending on the status, that’s why I’m using a “neutral” blue background here. Also, I assume deploy logs must be public for the link (and possibly the badge itself) to work.

Is forgetting a child in the backseat of a car a crime?

Gene Weingarten:

“Death by hyperthermia” is the official designation. When it happens to young children, the facts are often the same: An otherwise loving and attentive parent one day gets busy, or distracted, or upset, or confused by a change in his or her daily routine, and just… forgets a child is in the car.

This is one of the most disturbing, eye-opening articles I’ve ever read, not because those parents are monsters, but because it could potentially happen to any of us.

Quoting David Diamond, a professor of molecular physiology at the University of South Florida:

“The quality of prior parental care seems to be irrelevant,” he said. “The important factors that keep showing up involve a combination of stress, emotion, lack of sleep and change in routine, where the basal ganglia is trying to do what it’s supposed to do, and the conscious mind is too weakened to resist. What happens is that the memory circuits in a vulnerable hippocampus literally get overwritten, like with a computer program. Unless the memory circuit is rebooted – such as if the child cries, or, you know, if the wife mentions the child in the back – it can entirely disappear.”

I think these cases also say a lot about how innocent people, like rape victims, are sometimes blamed by others in order for them to cope with the harsh reality that the world is not inherently fair:

Humans, Hickling said, have a fundamental need to create and maintain a narrative for their lives in which the universe is not implacable and heartless, that terrible things do not happen at random, and that catastrophe can be avoided if you are vigilant and responsible.

In hyperthermia cases, he believes, the parents are demonized for much the same reasons. “We are vulnerable, but we don’t want to be reminded of that. We want to believe that the world is understandable and controllable and unthreatening, that if we follow the rules, we’ll be okay. So, when this kind of thing happens to other people, we need to put them in a different category from us. We don’t want to resemble them, and the fact that we might is too terrifying to deal with. So, they have to be monsters.”

Jeff Atwood wrote a great piece on this behavior in relation to internet harassment, so please check it out.

Migrating from Second Crack to Metalsmith

I’ve just migrated this blog from Second Crack to Metalsmith, mainly because I wanted to switch away from a PHP-based static site generator. I considered using Hugo, specially because it is pretty fast—and it would be nice to learn Go—but I found its template syntax a little off-putting.

It’s now being served by GitHub Pages too, with the code available here. I’ll write some follow-up posts soon detailing the plugins and scripts I’ve used to build and deploy the site.

Stay tuned.