Google Panda Getting You(r rankings) Down?

Posted by: Marketing Guy Date posted: February 14th, 2012 Published in: Google, SEO Reviews

Ah, all those years of a free ride on the GoogleBus finally coming to an end are they?  The big G finally stuck it to the SEO world last year with the rollout of the much talked about Panda update – a move which has resulted in much discussion in the industry.  And by discussion, I of course mean outrage and wild speculation, with theories ranging from URL structure changes to ad to content ratio (not to be confused with the recent “above the fold” update, although this is most like part of the same, larger process) being considered as factors.

So what’s the score here?  Is it game over for SEO, or just a game-changer?  This article takes a peek at Google Panda and offers some speculation on the logic behind it and some insights into how to depandify your site.

Let’s start with what we know

SER is a good place to start if you’re looking for some credible information on what’s happened with Panda – Barry has done a great job on reporting the facts, updates and discussions on the subject.  Check out the Panda category.  Here’s an overview of the updates so far:

  • Panda 3.2 on January 2012 17/18th <—this is probably the last Panda update
  • Panda 3.1 on November 18th
  • Panda 2.5.3 on October 19/20th
  • Panda 2.5.2 on October 13th
  • Panda 2.5.1 on October 9th
  • Panda 2.5 on September 28th
  • Panda 2.4 in August
  • Panda 2.3 on around July 22nd
  • Panda 2.2 on June 18th or so
  • Panda 2.1 on May 9th or so
  • Panda 2.0 on April 11th or so <— this was the first international rollout
  • Panda 1.0 on February 24th 2011

Google confirmed this week that they have improved how Panda integrates with their overall ranking system, which basically will mean we probably won’t see any major Panda updates again.

Shortly after this first iteration of Panda, Matt Cutts and Amit Singhal were interviewed by Wired and the resulting article was just about the only information any has from Google on the subject.  The takeaways from the article were a number of self help questions for webmasters;

  • Would you be comfortable giving this site your credit card?
  • Would you be comfortable giving medicine prescribed by this site to your kids?
  • Do you consider this site to be authoritative?
  • Would it be okay if this was in a magazine?
  • Does this site have excessive ads?

 

Search Quality Rating Guidelines

But, the article did make mention of quality raters, Google’s team of external minions who review search results and provide feedback to the almighty one.  Back in October, a manual for raters was leaked. Not sure if the document still is available, but it was an interesting read.

What I took from this more than anything was how Google is training their raters to categorise documents (web pages).  How many of you (I’m assuming mostly SEO’s will be reading this) have thought about content largely in the context of original vs duplicate?  Quite a few I imagine – that’s how I approached the subject for a long time (albeit to varying degrees at both sides of the coin).  But looking at the guidelines, it seems Google has set out, probably for a long time, to categorise web pages based in the context of the search query (which makes sense).  These terms are used:

  • Vital – is this result vital to the search query?
  • Useful – helpful for most users.
  • Relevant – helpful for many or some users.
  • Slightly relevant – not very helpful for most users, but somewhat related to the query.
  • Off-topic or useless – helpful for very few or no users.

A lot of people assumed that these ratings (despite Google saying they don’t have a direct impact on rankings) are a means for the average Joe Bloggs quality rater to randomly slap their site with a penalty.  They aren’t.  Think bigger.  Think automated.  Think like Google.

Google is all about the automation of search – which is not only one of their core business values, but a reality of the market, given the logistics involved.  So having random people, randomly rank random websites is never going to work.  But they aren’t rating websites, are they?  They are rating search results.  But what use is this, other than for some internal quality control exercise?

 

Machine Learning

One approach to machine learning, is clustering.  Wikipedia defines it as:

Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense.

Makes no sense to you?  In marketing speak, this would be akin to defining customer personas for your market.  I.e. “John Smith – 30-45 – AB1 professional – likes golf, theatre and pies”.  And no, this doesn’t mean Google is assigning your website to a specific persona or cluster.  What I believe is happening with the Panda updates, and something Google is getting gradually better at, is Google is using the data from the quality raters to determine a profile / persona / cluster of different search queries and using that data to map out what each query should look like.  Panda (I theorise) is simply the addition of this data into the algorithm to varying degrees.

For example, the average top 10 for these queries “should” contain;

  • Cluster 1 (brand query – “Virgin”) – 1 vital result (the brand), 6 useful results (brand pages or references), 3 relevant results (lesser known references to brand)
  • Cluster 2 (service query – “SEO Services”) – 0 vital results, 10 useful results (SEO providers)
  • Cluster 3 (information – “IT careers”) – 0 vital results, 5 useful results (IT recruiters), 5 relevant results (career advice)

Now, what does Google do if (for example), their data shows that results for a brand query actually has only 1 vital result but 9 slightly relevant results (in an extreme, fictional example)?  Time to take a look at those sites.

Think about it.  How many sites with pretty poor content are propelled to the top of competitive terms entirely due to SEO magic?  Google needs to counterbalance this and Panda could be the mechanism by which this is accomplished.

How many sites have content ranking for terms via bland articles?  I.e. ranking for a service term when you don’t actually offer the service (but you want the traffic to drive to ad clicks, related services, etc)?  We’ve all done it – shot for money terms to pull in the mass traffic.

 

Time to rethink your content strategy

Content?  I thought Panda was about design, trust, ad to copy ratio, etc?

Well, it’s likely that this is a misconception based on a variety of theories that have floated around.  Some of these factors can be included into an automated algorithm, but not them all.

Panda is about relevance.  Not how relevant YOU think your website is for any particular theory, but how relevant RATERS think it is, and how GOOGLE filters this perception into the larger algorithm.

A practical example;

I have a hobby site (10 years old) that was hit by Panda (the first international roll out).  Only the root content was hit – not the discussion forum subdirectory.  Didn’t touch it until January 2012 (yeh, I slack).  It’s only been 5 weeks and the rankings have almost fully recovered.

The process I followed was the same as everyone else;

  • Sort out the design (it was hideous).
  • Ditched the crap content (10% of the articles were thin SEO copy).
  • Ditched the forum and all 40,000 URLs (yes, I removed the part of the site that wasn’t affected by Panda – I’m so rock and roll!  It was a business decision…).
  • Integrated ads better (didn’t think of this as a Panda solution – just messed around with positioning and format until I was happy with how it looked).
  • Began adding more content (the site hadn’t been updated since 2006).
  • 301 all redundant URLs to a close match page or the homepage if there was none.

The results;

  • Content keyword rankings (i.e. not the homepage stuff), recovered within days (I believe the new content was ranked initially without Panda penalty added), then dropped, before recovering well.
  • Certain keywords still not ranking.  They are service related keywords (i.e. like “free WordPress themes”), but my ranking page was just an article on “why not to use free WordPress themes” – i.e. “slightly relevant”.  Ranked for a few days, dropped again, came back again (Google is still picking up 301 so I think it’s just being recalculated – possibly reviewed to see if it’s still the same content that was pandised), and finally dropped back out again.
  • Homepage keywords are beginning to rank top 10 again.  4th actually – was 5th-10th for 6-7 years.

My takeaway from this is that a percentage of my content was low relevance (it was rubbish) and it appeared in enough search results (for high enough volume of terms) for enough quality raters to mark it as low relevance that Google applied the Panda filter on that portion of the site.  My redesign (URL changes and 301′s) circumvented the filter temporarily, but there’s no getting away from it – same content that caused the filter = you’re still in the doghouse.

My next step is to revise the low relevance pages that aren’t ranking and start offering content on them that I think more accurately matches the queries they used to rank for.  I fully believe the site and page retains its former ranking weight – so rethinking the content strategy will essentially unlock this unused potential.

So don’t abandon your pandised sites just yet!

 

Not following?

Think about the process like this;

  1. Google sends out a set of keywords to raters to do their thing.
  2. Google uses the data to create clusters / templates of different types of search results.
  3. Google applies the Panda filter to downgrade those who are using SEO magic to shoot above their weight.

When you make changes to your site to a certain degree – perhaps significant content or structure changes – it’s enough to trigger an evaluation where your site gets to go back to quality raters and re-sit the relevance exam.  If you pass, you’re in, if not, then it’s back to the drawing board.

There could be varying flavours of this;

  • Your content sucked and still sucks.
  • Part of your content sucked and you haven’t removed or improved it yet.
  • Your content was OK, but your design sucked and the low paid student types didn’t take the time to read your content properly.
  • Your content is good (well written) but still low relevance for the target query.

On the subject of design…

Raters are low paid student types and their attention to detail isn’t quite that of the Borg that have been assimilated by the ‘plex.  This isn’t good.  What’s the problem here?  The geeks in the quality control department at Google have looked at the data and say that (for example) 10% of the documents marked as “relevant”, are in fact “useful”.

How did this happen?   Both groups use the same set of guidelines.  The only difference is that one lot is just a bunch of random people and the other lot is a group of internally trained search geeks.  But, we can’t go live with this much dirty data, so let’s look into it further.  Let’s check the comments from the raters.

What do you think the top reasons were for people rating the content of websites lower than it should be? I would guess…

  • I don’t feel comfortable giving this site my credit card details.
  • I wouldn’t be comfortable giving medicine prescribed by this site to my kids.
  • I don’t consider this site to be authoritative.
  • I wouldn’t read this article if it was in a magazine.
  • This site has excessive ads.

Just because Google says one of their engineers compiled this master list of questions, doesn’t mean he didn’t get it from feedback sent via the quality raters!  The SEOmoz whiteboard Friday article from June has a really great explanation of how this process works;

The sites that people like more, they put in one group. The sites that people like less, they put in another group. Then they look at tons of metrics. All these different metrics, numbers, signals, all sorts of search signals that many SEOs suspect come from user and usage data metrics, which Google has not historically used as heavily. But they think that they use those in a machine learning process to essentially separate the wheat from the chaff. Find the ones that people like more and the ones that people like less. Downgrade the ones they like less. Upgrade the ones they like more. Bingo, you have the Panda update.

However, I don’t believe it’s simply a case that Google is differentiating between good and bad sites in those very black and white terms (there are loads of reasons why – it would be a huge misuse of their monopoly for example and would probably land them in trouble) – it’s a very subjective concept which I don’t believe Google has either the right or the ability to judge.

Their own search results however, are a different story.  Google entirely has the right to judge what is relevant and what isn’t for any given query – that is their end product, after all.

 

Gradually diminishing updates

If you think about how this process is working – Google is gradually refining how results should be categorised.  They have the template, they just need to tweak their algorithm to perfect it.  That’s why we’ve seen a lot of updates this year – data goes back to the raters and they mark it accordingly.  Then it comes back to Google who have to essentially a) fix their mistakes and b) reinclude any sites that were filtered – updates are just data refreshes with perhaps minor algo tweaks thrown in.

Each update is essentially the result of a few weeks or months rating work to perfect the process until it moves to the point that the error margins are low enough to reassess on an ongoing basis – which is roughly where we are now.  It probably won’t mean quicker turnaround for sites in terms of recovery, but it will probably mean it might happen on a keyword by keyword basis rather than a site wide or folder wide basis as it is just now.

Of course, there are elements of the feedback from quality raters that Google can effectively automate in some way – for example the above the fold algorithm update.

That is unless you want to put your tin foil hat on and believe that there is no actual above the fold algorithm!  Perhaps Google just had a set of results they knew they were going to blitz in the next update (because raters commented that they were too ad heavy)?  Maybe they just decided to go live with that change and simply call it an “above the fold” update, when in fact it was just a data push and a bit of PR spin to encourage people to clean up their sites?  I guess we’ll never really know, but the effect is the same – somehow, either via automated means or via human intervention, Google is moving to a far more qualitative approach to serving search results.

SEO has become (and for many has always been) about returning relevant results for your targeted search terms.  Not just copy that chats about your targeted search term.  Not just an ecom that looks OK and offers the same basic services as your competitors.  But content as a product – content that will set you apart from your competitors – content that deserves to be number one.

That’s the real difference between what constitutes “content” and “thin content”.  The latter is probably what caused your Panda penalty in the first place.

 

Panda recovery – what to do?

Almost as loaded a question as “how do I build a successful website?”.  I’ll outline the basic process I followed – it may or may not work for you, but at least some options worth exporing.

  1. Be realistic about the quality of the content you want to rank for specific terms.  Seriously, check out the competition – is there anything they are doing that you aren’t?  I assume that my site will fall under some kind of qualitative review – even if it doesn’t, then it’s still a good approach to take.
    1. And try to think about this in the context of the clusternig concept I discussed above.  While you might have a “fairly relevant” piece of content, does that level relevance have a place in searches for the term it used to rank for?  In real terms, this might mean your 300 word article on using Google Adwords will never rank for “Google Adwords”, regardless of the SEO techniques you use.  Think bigger.  Think better.
  2. Don’t abandon previously strong pages.  A lot of people have been removing old content as part of a Panda solution (including me).  I don’t think this is necessary.  What is necessary is bringing this content up to a reasonable quality – if you can’t do that, then ditch it.  But if you can, then do so and you could reasonably expect to retrieve old rankings.
  3. Does your design suck?  It’s surprisingly common.  Take the time to design a good looking site that works for your visitors.  Even if my wild theory of quality raters and machine learning is completely off, real people still visit your site – what more can you be doing to improve their experience?
  4. On the subject of conversions…how can you improve them?  Are you using In-Page Analytics to see where people are ending up on your site?  Do you have a solid call to action on all of your pages?  What about additional marketing channels – social, e-campaigns, user generated content?  What actions do you want people to be taking?  Are you providing enough information for your customers?  What stage of their decision making journey do they arrive on your site at?  What feedback have you had from your existing customers?  Can you provide testimonials on your site?
  5. Go social. Build up your network and take the burden off your reliance on SEO.  Use social as a metric to measure your performance in terms of the new design.  Google Analytics tracks social actions – are these increasing?  Why not?  If you don’t have at least some content that is social worthy (in that people actively share it), then why do you think it deserves to rank for competitive terms?
  6. Be real.  Not in the 80′s sense!  How does your site appear to first time visitors?  Are your products up to date?  Has your blog been posted to recently?  Is your Twitter feed active?  Have you updated your copyright year?  Does your site appear stale and dated, or modern and vibrant?  First impressions are key – here’s some things I did to my blog;
    1. Recent posts widget (always good to have)
    2. Popular posts widget (chances are a lot of people want to read these)
    3. Random posts (in the footer – keeps the page fresh)
    4. Related posts (after the article – increases the time on site)
    5. Large social buttons (convert to action much better than small ones)
    6. Regular tweeting & feed widget (the appearance of dates gives the impression of freshness)
    7. Regular & guest posting (lots of reasons to come back)
    8. Engage with Twitter followers (shows activity and results in more readers, guest authors and retweets)
    9. These aren’t Panda recovery tools – it’s just good content marketing.
  7. Research, review and plan.  What’s the next phase of your business?  How can you target more customers?  How can you engage more customers?  How can you sell more to existing customers?  The nature of SEO means that sometimes it’s easy to become lazy with other areas of business (if SEO is bringing in big traffic levels at a low cost) – time to revise your strategy on that front.

Getting over the technical barrier that is Panda will be relatively easy.  Convincing Google that your site once again deserves to be number one will be decidely more difficult.  Regardless of what the truth about Panda is, the main change for the SEO world is that the industry has finally begun to shift from a process that targets the algorithm to a process that needs to target the end user.  Many people have been doing this successfully for a long time now, and I know many are happy the market is evolving this way.  The question is can you adapt to these changes?

Of course you can – adaptation is what SEOs do best!  Well, adaption and link whoring.

 

The Panda Challenge

Panda Blues getting you down?  Worry not!  For the next couple of weeks, I’ll open this post up to a Panda Q&A.  If you’d like to put your Panda hit site forward for review and don’t mind discussing it publicly, I (and anyone else who’d like to chime in) would be happy to offer up some specific advice on how to recover.

Post your details and some background below and we’ll take it from there.

Also, if anyone has any comments on the stuff I’ve talked about on this article, I’d love to hear them.  Most of this is just theoretical based on some superficial research with limited resources – by no means am I suggesting this is a definitive overview of what Panda actually is – no one knows that.  But, SEO tends to be lots of theory which each individual contextualises in their own mind in order to develop strategy – this is the approach I’ve taken here – a solid theory that leads to a sensible strategy.  Make of it what you will!

Scott


Share this article...


...or leave a comment below.