Analysing Google’s Penguin 4 update: What do we see and think so far!
Reviewing Google Penguins Vs the Porkies
1) The update actually took place on the 2nd of September, NOT the Friday 23rd when it was announced!
That’s been a typical trait of Google over the years with other major changes such as Hummingbird (which we noticed before anyone else), and the first wave of Penguin etc. of announcing some weeks after the actual launch. This allows Google to go backwards without informing the SEO community of whats going on, if things don’t seem right or it all goes wrong.
2) This update has been embedded into the core algorithm.
This has been a massive internal change in Googles engine. Why? First, its taking Google over two years to implement the latest Penguin update which is a hell of a long time knowing the gravity of the situation that the many affected were left in limbo during this time. So why did it take so long to implement this as a core ongoing realtime penalty? The fact is that the entire engine that we know as the core algorithm would have had to be substantially rewritten to accommodate this change. Look at the evidence, Penguin when it was first announced looked for any signs of manipulation on specific keywords and while the negative effects were applied to sometimes specific terms only, this was most likely implemented by hacking other existing features in the core algorithm which was not an ideal solution (limited) and moreover would place the sites into limbo by not giving them their due rewards upon receiving further positive signals (links), as such, once trust was lost domains could get away with a lot less. Making subsequent gains almost impossible. Thus if affected (caught) then those specific domains would often be unworkable from an SEO POV. Rendering all SEO work almost worthless and moreover calling into question any resources that were used to link to already penalised domains, as well as the efforts of the SEO agencies tasked with the challenge to help recover or improve such projects.
When Google talk about making the change to a realtime ‘granular’ penalty, they have conceded that in its prior implementations Penguin was far from ideal, while better than nothing, no doubt it was difficult to rerun, missed many instances of actual link manipulation, relied on hacking other aspects of the core algorithm or limited the core algorithm in some other way, so as to be necessary that an entire rewrite of the code was necessary. During this time and process it seems as if all efforts have been focused on this process, to the detriment of many other tasks and general work that would take up the focus and time of the engineers or people working in the spam team to spot and remove spam.
What effect is this Granular thing they talk about going to have?
Google seem to have paused doing manual spam detection since about the end of November 2015. This is also evidenced by the lack of link networks being reportedly discovered over the last year by blogs such as Seroundtable. To me it’s as if the focus has been placed on higher level projects (such as this rewrite). So hunting for link networks by hand using spam teams would seem to be something that would be designed out of the algorithm ASAP. Especially if the benefits of such work would be short lived or no longer used. In any rewrite of the core algorithm it would change almost everything that had come before, and the reliance on manual IP blocks (blacklists) or hand complied data about domains would probably be superseded by new ideas on that would be figured out and implemented by machine learning techniques and hence be able to be subsequently applied granularly.
Rather than relying solely on spam reports (from your competitors) and manual reviews to detect link networks or paid links, one of the most interesting methods to discover link graphs is to build analytical tools that work through link graphs that look for unnatural patterns of links (whereby domains link to uncommon resources in unnatural ways) such as domain A and B both have links to both domain C and D, to find unnatural correlations and patterns, and while this could happen in the wild (without manipulation intended) it is something that has been of interest to detect spam for some time although it is made incredibly challenging by the sheer size of the link graph and the way in which Google have developed. If Google were to try and analyse the whole link graph looking for such anomalies every time they found a new link it would be too slow and anything that’s too slow is unworkable over such large datasets. Though with recent changes especially going back about 1 year and following the Hummingbird engine update, Google have been able to detect and find find these needles in haystacks more efficiently.
Getting back to what has happened in this update, Google have taken the above concept of using link graphs to find spam and have applied it in another interesting way. They have recognised that entities who own more than one domain often engage in similar practices, as such, if a penalty situation occurs on one domain, or if the practices affecting a group of domains owned by one entity amount to manipulation, then by looking at and grouping all of the domains owned by one entity they can detect a whole lot more potential Penguin like activity, and target specific keywords across multiple domains. As such if you owned a bunch of domains all trying to rank for the same terms, or some similar terms, then in this update you can expect to see targeted effects whereby all of those domains will be hit on those specific keywords. It’s not like a duplicate content filter whereby one page will win, it seems to have a negative total effect.
3) No link graph refreshes since last December 2015.
The fact that Google have for the last year not performed any link graph data refreshes is odd. Typically we would see up to 1 to 4 such refreshes every year. This is when Google reanalyse pages which they had already factored the effects of such links in the SERPs. So for example, if you have a page with a link on it then you edit the anchor text or remove the link sometime later, Google would not notice this change until they did another link graph refresh. These rare refreshes used to result in sometimes huge upheaval in the SERPs while Google reflected the actual changes that had happened. Contrary to what most people in SEO think, Google do not have the resources to continually reanalyse (as they crawl) pages looking for minor edits, once they have already factored that page once. So why no new refreshes? This could very well be connected to the current rolling Penguin update, maybe the teams were too busy building this implementation of the algorithm to implement the refresh or something about the last Penguin 3 update rendered it impossible or not worth doing. That said despite the new rolling Penguin 4 update going live a data refresh still has not happened yet. Again this is odd since previously big updates would be preceded by such link graph data refreshes. I suspect this is yet to happen and may well be impacted by the rolling type of Googles core algorithm. Certainly something is conflicting Googles ability to run either of these previously manual type events and most likely the data refreshing has also been baked into this current implementation of Penguin.
4) The unannounced Panda?
At the start of June, Google implemented what appeared to be the start of this rolling update algorithm engine change. That’s when I first noticed something significant (Panda like) had changed at Google, following a long hiatus whereby Google seemed to be asleep, this update signalled the start of this wave of changes. But it was a significant change not like any singular Penguin or Panda update, it was new, connected to both, and had some unique traits (such as results disappearing off the radar without trace). At some point Google will have to do another data refresh but given the recent changes it’s highly likely that it will be a granular event from now on, whereby Google are planing to recrawl and refactor what they know and understand about all pages in isolation to others, as they go. So no doubt there will be no more noticeable major link graph refreshes instead we will see rolling type effects that find and refactor link data according to Panda and Penguin in granular portions.
5) The Effects Are Still To Come…
Given there has been no sign of the link graph data refreshing, the Penguin changes that Google have just implemented on the 2nd of September would date back to link changes (edits/removals) made before December 2015 and more current disavow files. That means that if you did clean up work prior to 2016 then you should have seen these effects on the 2nd September 2016 update / implementation, whereas if your cleanup work was done during 2016, you will be still waiting for Google to reflect these changes (your link removal work), which they will do when they start refactoring pages that were already analysed and factored into the results but have changed or been removed during 2016.