Canning Google Analytics Spam

Canning Google Analytics Spam: A guide to kicking spam and ghost bot traffic out of your business's data
Roger Mozbot, model

Have you checked your Google Analytics data lately? Noticed an unusual bump in referral traffic? On the surface, it may seem like a good thing, but digging a bit deeper often uncovers something like this:

list of spam referrers in Google Analytics

Your first thoughts may be along the lines of what the hell? Why are all these crappy sites linking to our company’s website? Have we been hacked? Kind of.

These crappy sites are spam referrers, and they’ve been a growing problem for site admins for some time. Google’s head of web spam, Matt Cutts explained referrer spam in this video from 2013:

Just ignore it? Maybe that was sound advice two years ago when just a few visits were coming from spam. However, with more and more janky referrers popping up each month, many site admins (especially those who manage sites for SMBs) are seeing a significant portion of site traffic from these bots rather than people. In addition to skewing analytics data (which impairs the ability to make sound, data-based decisions for the business), having so many spammers/bots hit your site day-to-day can increase page load time, which can be a big turn-off for your actual human users.

So what can you do to remedy this? First and foremost: DO NOT CLICK THROUGH TO ANY OF THESE SITES! Doing so rewards them with the inbound traffic bump they’re seeking, earns them ad dollars for your visit, and many of these sites are infested with malware and terrible design. If there are any sites listed among your referrers that you’re not sure about, open a Google search page and type the address of the site in the search bar (NOT your browser address bar). Chances are, if it’s a spam referrer many of the results you get from searching for that domain will be articles about spam referrers and/or questions from other site admins about how to block that particular domain.

Type the address in the search field, NOT the address bar.

Once you have a list of all the spam referrers wreaking havoc on your data, you can begin work to weed them out of your Analytics data and prevent them from coming to your site at all.

Types of Spam Referrers

There are a couple common ways spammers blow up your Analytics data:

Spam referral traffic comes from bots who use fake HTTP headers (which often look like the URL of a site they wish to promote) to make repeated requests to a site which show up in the activity logs as referral visits. Some sites publish these logs, which result in a link back to the spam referrer.

Ghost referrers are bots who never actually visit a site, but still appear in the site’s traffic reports. How in the hell? These bots pick random UA codes (your account ID from Google Analytics, in the form UA-########-1) and use automated scripts to inject their referrer data straight into your Google Analytics reports.

I feel dirty just typing that. Let’s get back to cleaning up this mess.

With your list of spam referrers in hand, we’ll create some filters in Google Analytics to exclude this traffic from reports. To do this, we’ll need to set up a custom view. I want to emphasize the importance of keeping one view in your account unfiltered and untouched. For us, that’s the default “All Web Site Data” view. This ensures that should anything go wrong, we always have the original data to work with. If you’ve never set up a custom view in Google Analytics before, here’s a quick guide on how to do it. As well, if you’re creating your first filter, you should read this guide first to get a better understanding of how they work.

To stop tracking referral spam traffic in your reports, go to the Admin tab in Analytics and select All Filters under the Account column. Then click the button to add a new filter.

Create a new filter for spam referrers in Google Analytics
  1. Give the filter a descriptive name. Since there were a lot of spam domains, and the Filter Pattern field (step 4) only allows 255 characters, I split them up into two separate filters.
  2. Select Custom for Filter Type.
  3. Exclude based on Campaign Source. This allows you to just use the domains you see in the referral report, instead of adding each individual URL from the domain.
  4. For Filter Pattern, if you have several domains to add, we’re going to get fancy and use Regular Expressions (regex). Don’t worry, it’s not complicated! Simply list each domain from your spam referrer list with a pipe character ( | ) between them. No spaces! We also don’t need to use the backslash character to escape periods here, as Google Analytics doesn’t force this. If you have a bunch, you might draft this in a document that allows you to see the character count as you type. Remember, you only get 255 characters in this field.
  1. Apply the filter to the custom view you created for this, and save. If you need to create other filters, you can skip adding the view for now and just save. Create the rest of the filters you need and then apply them all to the view from the Admin panel in Analytics.

Keep in mind that filters do not show retroactive data in reports, so don’t freak out when you go to test the view and see zero data! Check that you’re viewing data starting from the day you created the filter.

It’s a good idea to make an annotation in your main traffic chart in the All Website Data view indicating the date you blocked spam referrers, and additional annotations for any previous spikes in traffic that you are able to attribute to spam referrals.

Now we’ll exorcise those ghosts.

In your Analytics reporting dashboard, go to Audience > Technology > Network and select Hostname as the Primary Dimension.

Traffic by Hostname in Google Analytics

The only hostname I want to see here is my own website properties. Just about everything else— including the (not set) data—  is garbage. The exceptions here: Hostnames for translate.googleusercontent.com and webcache.googleusercontent.com aren’t ghosts! They’re instances where people viewed content on your site using Google’s translation tool, or viewed a cached version of a page on your site. These are good bots, so be sure to include them in the next step!

Now let’s set up another filter, but this time we’ll only include the valid hostnames:

Filter by valid hostnames only
  1. Name the filter. (I got creative with it.)
  2. Select Custom for the filter type.
  3. Select Include.
  4. Choose Hostname in the Filter Field.
  5. Use regex to include your valid hostnames, in our case:

Save the filter and apply it to a view. You may want to test this filter in its own custom view for a few days to make sure it’s working properly. For instance, we tested a view with only the referral spam exclusion filters and a view with only the valid hostname inclusion filter before combining those into a view that excludes spam referrals and includes only valid hostnames.

These methods should keep your Google Analytics data much cleaner, but there is still one more task to complete our spam-canning mission: Block spam referral traffic from hitting our website at all. In my next post, we’ll learn how to harness the power of the .htaccess file to prevent spam referrers from accessing a website and wasting precious server bandwidth.

Until then, hit me with your questions! I’m also interested to know if you’ve used any other methods to block spammers from your site and what your experiences have been so far.