This post is part of Made @ HubSpot, an in-house series of thought leaders, where we draw lessons from experiments conducted by our own HubSpotters.
Have you ever tried bringing your clean laundry upstairs by hand and things keep falling out of the huge piece of clothing you are wearing? This is similar to trying to increase organic website traffic.
Your content calendar is full of new ideas, but with every website published, an older page drops in search engine rankings.
Getting SEO traffic is difficult, but sustaining SEO traffic is a whole other ball game. Content tends to “expire” over time due to new content created by competitors, constantly changing search engine algorithms, or a variety of other reasons.
You’re struggling to get the whole website going, but data loss keeps coming back if you don’t pay attention.
Recently we both (Alex Birkett and Braden Becker a) developed a way to automatically detect this traffic loss on a large scale and before it even occurs.
The problem with traffic growth
At HubSpot, we’re increasing our organic traffic by taking two trips from the laundry room instead of one.
The first trip is with new content and targeting new keywords that we haven’t ranked for yet.
The second trip consists of updated content that uses part of our editorial calendar to find out which content is losing the most traffic – and leads – and reinforce it with new content and SEO-focused maneuvers that better serve certain keywords. It’s a concept we (and many marketers) refer to as “historical optimization.”
There is one problem with this growth strategy, however.
When our website’s traffic increases, keeping track of every single page can be a stubborn process. Choosing the right pages to update is even more difficult.
Last year we wondered if there was a way to find blog posts whose organic traffic is at risk of only declining, diversifying our selection of updates, and possibly making traffic more stable as our blog grows.
Restoring the traffic versus protecting the traffic
Before we talk about the absurdity of trying to restore the traffic that has not yet been lost, let’s look at the benefits.
When you view a page’s performance, it’s easy to see a decrease in traffic. For most growth-minded marketers, the downward traffic trend line is hard to ignore, and nothing is as satisfying as the rebound in that trend.
There is a cost to restoring all traffic, however: since you can’t know where you are going to lose traffic until you lose it, the time between when the traffic drops and when it is restored is a victim of leads, demos, and free users. Subscribers or similar growth metric taken from your most interested visitors.
You can see this in the organic trend chart below for a single blog post. Even if the traffic is conserved, you have missed the opportunity to support your sales efforts downstream.
If you had a way to find and protect (or even increase) the site’s traffic before it needs to be restored, you wouldn’t have to make the sacrifice shown in the image above. The question is: how do we do this?
How to forecast falling traffic
To our delight, we didn’t need a crystal ball to predict the wear and tear of traffic. What we needed, however, was SEO data suggesting that traffic for certain blog posts could go goodbye if something should go on. (We also had to write a script that could extract this data for the entire website – more on that in a minute.)
High keyword rankings generate organic traffic for a website. Additionally, the lion’s share of traffic goes to websites that are lucky enough to rank on the first page. This traffic reward is even higher for keywords that receive a particularly high number of search queries per month.
If a blog post slips off the first page of Google for that high volume keyword, that’s a toast.
Taking into account the relationship between keywords, keyword search volume, ranking position and organic traffic, we knew that this would be the prelude to a loss of traffic.
And luckily, the SEO tools we have at our disposal can show us that the ranking is falling over time:
The image above shows a table of keywords that a single blog post is ranking for.
This blog post occupies 14th place for one of these keywords (page 1 of Google consists of positions 1-10). The red boxes show the ranking and the high volume of 40,000 monthly searches for this keyword.
Even sadder than this article’s 14th position ranking is how it got there.
As you can see in the teal trendline above, this blog post was once a high-ranking result, but kept dropping over the next several weeks. The post’s traffic confirmed what we saw – a noticeable drop in organic page views shortly after this post dropped from page 1 for that keyword.
You can see where this is going. We wanted to recognize these rank losses when they are about to leave Page 1, and in this way restore the traffic that we ran the risk of losing. And we wanted to do this automatically for dozens of blog posts at once.
The traffic tool “in danger”
The way the At Risk tool works is actually a bit simple. We thought about it in three parts:
- Where do we get our input data from?
- How do we clean it?
- What outcomes of this data will enable us to make better decisions when optimizing content?
First, where do we get the data from?
1. Keyword data from SEMRush
What we wanted was keyword research data at the property level. So we want to display all of the keywords that hubspot.com stands for, especially blog.hubspot.com, and any associated data that matches those keywords.
Some fields that are valuable to us are our current search engine ranking, our previous search engine ranking, the monthly search volume of this keyword and possibly the value (estimated with keyword difficulty or CPC) of this keyword.
To get this data, we used the SEMrush API (specifically the Domain Organic Search Keywords report):
Then, using R, a popular programming language for statisticians and analysts, as well as marketers (in particular, we use the ‘httr’ library to work with APIs), we pulled off the top 10,000 keywords driving traffic to blog.hubspot. com (also) than our Spanish, German, French and Portuguese properties). We currently do this once a quarter.
This is a lot of raw data that is useless on its own. So we have to clean up the data and bring it into a format that is useful for us.
Next, how do we clean up the data and build formulas to get answers to the content that needs to be updated?
2. Clean up the data and create the formulas
We also do most of the data cleansing in our R script. Before our data ever hits another data storage source (be it Sheets or a database data table), most of our data is cleaned up and formatted the way we want it to be.
We do this with a few short lines of code:
After fetching 10,000 rows of keyword data, in the code above from the API we parse them to make them readable and then build them into a data table. We then subtract the current ranking from the previous ranking to get the difference in ranking (so if we used to be in position 4 and are now in position 9, the difference in ranking is -5).
We filtered further so that only those with a difference in the ranking of the negative value are shown (i.e. only keywords that we have lost the ranking for, not those that we have won or that have stayed the same).
We then send this cleaned and filtered data table to Google Sheets where we apply tons of custom formulas and conditional formatting.
After all, we had to know: what are the results and how do we actually make decisions when optimizing content?
3. Issues of Risk Content Tool: How We Make Decisions
Based on the input columns (keyword, current position, historical position, position difference and monthly search volume) and the above formulas, we calculate a categorical variable for an output.
A url / line can be one of the following:
- “IN DANGER”
- Empty (no value)
Empty exitsor those lines with no value mean that we can essentially ignore these URLs for now. You haven’t lost a significant ranking or you have already been on page 2 of Google.
“Volatile” means that the page is losing rank but is not old enough to warrant action. New websites are constantly jumping around in rankings as they get older. At some point, they create enough “topic authority” to generally stay in place for a while. For content that supports a product launch or an otherwise important marketing campaign, we may give these posts a TLC as they are still mature. It is therefore worthwhile to label them.
“In danger” is mostly what we’re looking for – blog posts that were published more than six months ago, fell in rankings, and are now ranked between 8th and 10th for a high volume keyword. We see this as a “red zone” for bad content, with less than 3 positions from page 1 to page 2 from Google.
The spreadsheet formula for these three tags is shown below – basically a compound IF statement to find page-1 rankings, a negative ranking difference, and the distance of the publication date from the current day.
What we learned
In short, it works! The tool described above has been added to our workflow regularly, if not frequently. However, not all predictive updates save data traffic on time. In the example below, a blog post fell after refreshing page 1 and later moved back to a higher position.
And that’s okay.
We have no control over when and how often Google decides to redraw and reorder a page.
Of course, you can send the URL to Google again and ask them to crawl again (this additional step can be worthwhile for critical or time-sensitive content). However, the goal is to minimize the time this content underperforms and stop the bleeding – even if it means leaving quick recovery to chance.
While you never really know how many pageviews, leads, signups, or subscriptions you are losing on each page, the precautions you take now will save you time that you would otherwise spend figuring out why all your website traffic is one Jumped last week.