Interconnected Networks & Bipartite Graphs – Penguin4

As Google continues to improve upon their ability to detect manipulation of the link graph and henceforth penalise yet more subsets of the Internet, this recurring theme over the years has led to an increased interest by those reliant on search traffic to try and understand more specifically the nuances that may affect their rankings and subsequently their businesses.

We noticed this latest (Penguin4) update was better able to understand and target websites that have unnatural linking patterns which can be detected by looking for domains that are linking to a target which also commonly interlink to the same other resources. I.e. where some domains link out to many targets that other domains replicate or have similar outbound links too. By looking at the correlations between different domains through their outbound links and applying further filters to that data, it is possible to discover link networks such as directory networks, PBN (private blog networks) and other link schemes where manipulation of the link graph has ‘more likely’ been an intentional aspect of the linking practices.

Interconnected Network

Following Google’s Penguin:4.0 update 23/09/2016, I hypothesised that one specific improvement Google has made to the algorithm was their ability to use Bipartite Graphs (dense networks) through co-citation combined with other manipulation signals to detect Isolated networks that were more likely created to manipulate the search results.

Naturally Google has a patent on this concept dating back to 2007 but what we are observing with Penguin4 suggests that Google’s implementation of this concept has been recently improved upon and is now better suited to distinguish between naturally occurring dense networks Vs isolated manipulative networks.
This aspect of manipulation detection via dense network comparison is not widely understood within the SEO community and the discussion on the concept is thin on the ground, least for two very interesting resources that have covered these concepts in more detail, one being Bill Slawski of SEOBYTHESEA who talked about this area and has done so ever since Google published their patent which Bill covered in this article, and this excellent article by   which goes into even more detail about link graphs and how co-citation works to provide insights into relevancy, which he expanded upon in his research paper titled Relationships in Large-Scale Graph Computing. I am sure there are some more relevant articles from the SEO community that I have missed and I will continue to post any relevant information I come across retrospectively in the footer of this article.

Outside of Google there were (we have now developed one)  no commercially available tools which examined the link graph of a domain and allowed you to visualise or extract these dense interconnected networks, so over the last three months I developed a tool for The Link Auditors known internally as the Interconnected Network Tool, which is now offered as one part of several other link auditing tools that make up our link audit package. Developing this tool has already been an incredible journey, thinking through concepts and methods needed to extrapolate this data and make sense of it in the same way as previous academics and researchers within organisations such as Google has been extremely stimulating and allowed me to learn many important lessons and concepts relevant to link graphs and SEO manipulation, as I develop the overlaying filtering and sorting algorithms further these insights continue to evolve.

Our Interconnected Network Tool


STEP 6 Interconnected Networks
NEW TOOL Penguin4 launched 23/06/2016 targeted domains that have interconnected networks via co-citation and bipartite graph analysis. This new tool traces these dense networks and applies further filtering to weed out natural authorities from manipulative link networks.

Visualising the link graph of your domain is not just beautiful but also incredibly useful for both competitive analysis and to understand why Google both promote and penalise domains based on link graph anomalies. Comparing different domains by visually looking at their link graphs can provide huge insights that are otherwise hidden. Uncovering relationships between entities is almost impossible using any other method and having large data sets allows detection of subtle anomalies that can only be noticed with large data, no doubt removing manipulation based on these signals can have a huge impact on search results.  

Clearly this is where Google has been focusing for a long time (especially noticeable in the last Penguin4 update) and given the lack of insight about these concepts within the general SEO community at large, it is no wonder why the chasm has opened between the SEO community and Google. With most people failing to grasp what is going on and many others having given up entirely trying to manipulate Google with SEO; while Google seemingly has become resistant to many forms of manipulation. That’s not to say Google is winning outright and there is still large amounts of manipulation evident for all to see, only that the bar has been raised consistently and most people who formally engaged in this art are without the tools or the insights to understand the science of this endeavour. For every 1 that we see now successfully managing to manipulate Google we can see another 50 have been thwarted. Those getting good results now are either lucky or have a deeper insight into these concepts and can join the art and science together to make manipulation work.

Our Subsequent Findings

We are actively reverse engineering the results and establishing a pattern that is consistent with the assertions made by comparing domains that we know have manipulative links pointing to them. In this case study we are looking at two domains (RED & BLUE) both differ in the way their links are intertwined, the BLUE domain has hardly any interconnected links whereas the RED domain has many of its links shared with other interconnected domains. Both domains have purely manipulative links and yet with the launch of Penguin 4 update the RED domain is immediately negatively impacted whereas the BLUE domain steadily rises to the first position in for a highly profitable search term!

BLUE domain interconnected links:

propertyThis image clearly shows only a handful of interconnected links despite having a relatively large link profile while the RED domain (below) has many more Interconnected links and a significant section of an isolated network.

Isolated Network

This image (below) shows the Isolated section of the RED domain whereby the manipulative links that are isolated have been extracted only, these links would be considered to be linking in a way that is statistically unlikely given the overall signals which can be obtained by comparing how these differ from other more natural hubs and nodes.

I am slightly surprised that it has taken me us in the community so long to emulate what Google is doing in these areas, while Bill Slawski has been almost alone in deciphering what they are doing from their patents and maybe only a handful of us have been trying to decode Google by building tools to unravel their inner workings. These challenges are fairly difficult to do given the hurdles that need to be overcome, having the history, knowledge, insight and resources to develop such tools probably falls close to only a handful of people or organisations but I would have thought some of the bigger outfits would have been all over these concepts and built tools specifically targeting such areas. Clearly though having unlimited resources to employ academics and smart programmers, pays off over time and that is what is making the difference between winning and loosing this battle.

Google claim that people who try to manipulate their search positions hurt everyone who uses the search engine and that much of the manipulation is pure spam, or relates to porn or scams. This premise is used to justify the continuos efforts to weed out artificial linking practices that aim to manipulate the search results. Yet there is a legitimate interest for anyone who has a website to be found for relevant search terms relating to their website and that drive is ultimately what has led to the commercial success of Google’s Adwords program I.e. paid placements,  and the success of both Google and Alphabet INC. Maybe if Google were only penalising Porn sites, scam sites and spammy sites then there would be some understanding for Google’s sanctimonious, righteous crusade but while the majority of sites that are being penalised by these updates are normal small business owners who are simply trying to improve their traffic then it is very difficult to maintain the facade that these penalties are not someway driven by increased profits for the giant pay to play information highway.


Further links to relevant resources and discussion from the SEO community:

Search Quality: The Link Graph Theory by Dan Petrovic
Relationships in Large-Scale Graph Computing Article by Dan Petrovic
Google Patent on Web Spam, Doorway Pages, and Manipulative Articles 11/27/2007 by Bill Slawski

Further links to relevant resources and discussion from the academic community:

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg

Insights by Steven Carroll. Steven is the CTO for The Link Auditors as well as VolatilityNow, both tools were developed to reverse engineer Google penalties and understand what changes were taking place inside the blackbox of Google. You can SUBSCRIBE FOR EMAIL UPDATES HERE: or follow thelinkauditors on Twitter.
Subscribe for updates

What happens to your Google Rankings when you disallow all in robots.txt?

Screen Shot 2016-11-21 at 17.14.08

Something Not Good

Screen Shot 2016-11-23 at 15.46.12

Though it would be worse if it was also applied to the dot com version – wait it is…

Screen Shot 2016-11-23 at 15.55.07

But what sort of organic traffic loss would we be expecting here?

Wait that’s thousands of lost business leads or sales PER DAY.

There’s over 200 people with an interest in SEO working in STUBHUB (stubborn hub?) so one of them must be interested I guess:

Screen Shot 2016-11-23 at 16.02.11

Would someone please let them know…

Notice: Constant _BBC_PAGE_NAME already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 81 Notice: Constant _BBCLONE_DIR already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 82 Notice: Constant COUNTER already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 83
Insights by Steven Carroll. Steven is the CTO for The Link Auditors as well as VolatilityNow, both tools were developed to reverse engineer Google penalties and understand what changes were taking place inside the blackbox of Google. You can SUBSCRIBE FOR EMAIL UPDATES HERE: or follow thelinkauditors on Twitter.
Subscribe for updates

Penguin 4 Will Not Take Effect For Months Yet!

Google's New Penguin Pussy cat

Screen Shot 2016-10-04 at 18.04.44Reading between the lines and looking at the data reveals some further (to our Penguin 4 review) interesting insights about the nature of the current implementation of Google Penguin 4 and the removal of Google Penguin 3. Let me jump in and make some statements:

1) Penguin 4 is no longer a penalty, instead it's as soft as this pussy cat.
2) Google's manual spam team has been largely designed out with Penguin 4!
3) Google are implying that disavow files are now largely redundant (do they get finally it?)
4) Google's new granular formula takes a long time to fully compute and it will take months come in full...
5) Penguin 3 was largely removed around 2nd September to make way for Penguin 4, introduced 23rd Sept

Understanding the changes

In the old format of detecting link spam, (even before Penguin) mostly spam was detected with algorithms that would detect suspicious activity such as multiple links from one IP to another, gaining too many links too quickly, too many (spammy) anchor texts on a domain, spam reports by competitors, etc. then once Google's suspicions had been arisen, the spam team might take a physical look, then kill the linking out domain and leave notes about any penalised target domain, all of this would be totally invisible to the public apart from them seeing ranking changes. As well as they may add a penalty score that would automatically expire after some variable time to the guilty parties.

Google's New Pussy Cat

The new method of detecting link spam improves on these systems and introduces a few massive new changes, that of comparing every new link they find against the entire link graph of that domain and also against all the other sites that are also linked too and looking for any unnatural comparisons. Iterating though arrays of links in such a way is a slow time consuming task; it's a massive task to do on any scale no matter how much computing power you have! BUT, this is the most compelling and intelligent method of finding unnatural links because it is so common that domains that spam have one thing in common, they don't do it in isolation, they do it as a common practice, and these practices can be detected by comparing everything against everything else (read slowly).

Google's new method includes a function to classify networks owned by single entities, hence if they discover one entity owns more than one domain, that group of domains will be grouped and classified as a network, these groups will be the key to understanding the value of a link and the system will be able to detect natural links within networks VS unnatural links, according to the commonalties between the domains linked within the networks (read granular), and the accumulative level of suspicion between the two entities (groups).

That means that you will be judged on your total discretions across multiple domains and it's an accumulative penalty not isolated to each domain, that means that the focus is now on all domains owned by one entity and once suspicion has been aroused they will be comparing other related domains and looking for patterns. But all of this focus is more specific to the target domains (domains higher in the ranks) than it is to the general link graph. Because it is Penguin 4 and related to manipulation of search ranks the whole focus is now on removing manipulation and doing it algorithmically, thus there is going to be more white noise or collateral damage with this approach hence it has been necessary to simply remove the positive effects of suspect links rather than actually penalise domains with any certainty.

How accurate can this new method be? (Read Aggressive Tiger or a soft Penguin?)
Because this is a highly intelligent (processing heavy) method of eliminating the effect of manipulation (spam) and the effects of this new system is going to be much more granular and as a result much more effective (read laser precision weapons instead of carpet bombing towns), so much so that the confidence Google has in this new method has allowed Google to retire the old spam team and their old methods no doubt with huge savings on manual labour, offset by the cost of much more processing power now being applied to the problem. This will also mean that the effects on the SERPs will be significantly greater than what has currently been seen in the past, thus in my estimates the noticeable affects will be something like:

Penguin 3: percentage of the SERPS affected 6%
Penguin 4: percentage of the SERPS affected 18%

However the changes will not come into full effect in one huge update, but instead this should take many months to fully recompute the current link graph and then reclassify the link graph as they know it!

Notice: Constant _BBC_PAGE_NAME already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 81 Notice: Constant _BBCLONE_DIR already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 82 Notice: Constant COUNTER already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 83
Insights by Steven Carroll. Steven is the CTO for The Link Auditors as well as VolatilityNow, both tools were developed to reverse engineer Google penalties and understand what changes were taking place inside the blackbox of Google. You can SUBSCRIBE FOR EMAIL UPDATES HERE: or follow thelinkauditors on Twitter.
Subscribe for updates

Google Penguin 4 Review

Analysing Google’s Penguin 4 update: What do we see and think so far!

Reviewing Google Penguins Vs the Porkies

1) The update actually took place on the 2nd of September, NOT the Friday 23rd when it was announced!

That’s been a typical trait of Google over the years with other major changes such as Hummingbird (which we noticed before anyone else), and the first wave of Penguin etc. of announcing some weeks after the actual launch. This allows Google to go backwards without informing the SEO community of whats going on, if things don’t seem right or it all goes wrong.

2) This update has been embedded into the core algorithm.

This has been a massive internal change in Googles engine. Why? First, its taking Google over two years to implement the latest Penguin update which is a hell of a long time knowing the gravity of the situation that the many affected were left in limbo during this time. So why did it take so long to implement this as a core ongoing realtime penalty? The fact is that the entire engine that we know as the core algorithm would have had to be substantially rewritten to accommodate this change. Look at the evidence, Penguin when it was first announced looked for any signs of manipulation on specific keywords and while the negative effects were applied to sometimes specific terms only, this was most likely implemented by hacking other existing features in the core algorithm which was not an ideal solution (limited) and moreover would place the sites into limbo by not giving them their due rewards upon receiving further positive signals (links), as such, once trust was lost domains could get away with a lot less. Making subsequent gains almost impossible. Thus if affected (caught) then those specific domains would often be unworkable from an SEO POV. Rendering all SEO work almost worthless and moreover calling into question any resources that were used to link to already penalised domains, as well as the efforts of the SEO agencies tasked with the challenge to help recover or improve such projects.

When Google talk about making the change to a realtime ‘granular’ penalty, they have conceded that in its prior implementations Penguin was far from ideal, while better than nothing, no doubt it was difficult to rerun, missed many instances of actual link manipulation, relied on hacking other aspects of the core algorithm or limited the core algorithm in some other way, so as to be necessary that an entire rewrite of the code was necessary. During this time and process it seems as if all efforts have been focused on this process, to the detriment of many other tasks and general work that would take up the focus and time of the engineers or people working in the spam team to spot and remove spam.

What effect is this Granular thing they talk about going to have?

Google seem to have paused doing manual spam detection since about the end of November 2015. This is also evidenced by the lack of link networks being reportedly discovered over the last year by blogs such as Seroundtable. To me it’s as if the focus has been placed on higher level projects (such as this rewrite). So hunting for link networks by hand using spam teams would seem to be something that would be designed out of the algorithm ASAP. Especially if the benefits of such work would be short lived or no longer used. In any rewrite of the core algorithm it would change almost everything that had come before, and the reliance on manual IP blocks (blacklists) or hand complied data about domains would probably be superseded by new ideas on that would be figured out and implemented by machine learning techniques and hence be able to be subsequently applied granularly.

Rather than relying solely on spam reports (from your competitors) and manual reviews to detect link networks or paid links, one of the most interesting methods to discover link graphs is to build analytical tools that work through link graphs that look for unnatural patterns of links (whereby domains link to uncommon resources in unnatural ways) such as domain A and B both have links to both domain C and D, to find unnatural correlations and patterns, and while this could happen in the wild (without manipulation intended) it is something that has been of interest to detect spam for some time although it is made incredibly challenging by the sheer size of the link graph and the way in which Google have developed. If Google were to try and analyse the whole link graph looking for such anomalies every time they found a new link it would be too slow and anything that’s too slow is unworkable over such large datasets. Though with recent changes especially going back about 1 year and following the Hummingbird engine update, Google have been able to detect and find find these needles in haystacks more efficiently.

Getting back to what has happened in this update, Google have taken the above concept of using link graphs to find spam and have applied it in another interesting way. They have recognised that entities who own more than one domain often engage in similar practices, as such, if a penalty situation occurs on one domain, or if the practices affecting a group of domains owned by one entity amount to manipulation, then by looking at and grouping all of the domains owned by one entity they can detect a whole lot more potential Penguin like activity, and target specific keywords across multiple domains. As such if you owned a bunch of domains all trying to rank for the same terms, or some similar terms, then in this update you can expect to see targeted effects whereby all of those domains will be hit on those specific keywords. It’s not like a duplicate content filter whereby one page will win, it seems to have a negative total effect.

3) No link graph refreshes since last December 2015.

The fact that Google have for the last year not performed any link graph data refreshes is odd. Typically we would see up to 1 to 4 such refreshes every year. This is when Google reanalyse pages which they had already factored the effects of such links in the SERPs. So for example, if you have a page with a link on it then you edit the anchor text or remove the link sometime later, Google would not notice this change until they did another link graph refresh. These rare refreshes used to result in sometimes huge upheaval in the SERPs while Google reflected the actual changes that had happened. Contrary to what most people in SEO think, Google do not have the resources to continually reanalyse (as they crawl) pages looking for minor edits, once they have already factored that page once. So why no new refreshes? This could very well be connected to the current rolling Penguin update, maybe the teams were too busy building this implementation of the algorithm to implement the refresh or something about the last Penguin 3 update rendered it impossible or not worth doing. That said despite the new rolling Penguin 4 update going live a data refresh still has not happened yet. Again this is odd since previously big updates would be preceded by such link graph data refreshes. I suspect this is yet to happen and may well be impacted by the rolling type of Googles core algorithm. Certainly something is conflicting Googles ability to run either of these previously manual type events and most likely the data refreshing has also been baked into this current implementation of Penguin.

4) The unannounced Panda?

At the start of June, Google implemented what appeared to be the start of this rolling update algorithm engine change. That’s when I first noticed something significant (Panda like) had changed at Google, following a long hiatus whereby Google seemed to be asleep, this update signalled the start of this wave of changes. But it was a significant change not like any singular Penguin or Panda update, it was new, connected to both, and had some unique traits (such as results disappearing off the radar without trace). At some point Google will have to do another data refresh but given the recent changes it’s highly likely that it will be a granular event from now on, whereby Google are planing to recrawl and refactor what they know and understand about all pages in isolation to others, as they go. So no doubt there will be no more noticeable major link graph refreshes instead we will see rolling type effects that find and refactor link data according to Panda and Penguin in granular portions.

5) The Effects Are Still To Come…

Given there has been no sign of the link graph data refreshing, the Penguin changes that Google have just implemented on the 2nd of September would date back to link changes (edits/removals) made before December 2015 and more current disavow files. That means that if you did clean up work prior to 2016 then you should have seen these effects on the 2nd September 2016 update / implementation, whereas if your cleanup work was done during 2016, you will be still waiting for Google to reflect these changes (your link removal work), which they will do when they start refactoring pages that were already analysed and factored into the results but have changed or been removed during 2016.

Notice: Constant _BBC_PAGE_NAME already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 81 Notice: Constant _BBCLONE_DIR already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 82 Notice: Constant COUNTER already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 83
Insights by Steven Carroll. Steven is the CTO for The Link Auditors as well as VolatilityNow, both tools were developed to reverse engineer Google penalties and understand what changes were taking place inside the blackbox of Google. You can SUBSCRIBE FOR EMAIL UPDATES HERE: or follow thelinkauditors on Twitter.
Subscribe for updates

Google Penalty Recovery Service

It’s often thought that recovering from a penalty consists of inspecting the back links, finding the toxic ones and then disavowing them. ON the country this simplistic approach has been shown time and again to delay the recovery process so much so that companies sometimes waste years in a helpless state thinking they have done everything they can according to Google and that somehow Google is victimising them unfairly. Whilst that viewpoint is often the most comfortable perspective to espouse by the professionals working in this space, we at the link auditors have a very different perspective.

Companies working to remove penalties have an obligation to undo the manipulation in order to show good faith to Google and satisfy their demand for toxic links to be removed, thereby undergoing the difficult work of reaching out to the linking sites and asking them to accommodate the removal of any links placed with the intention of manipulating Google.

That turns out often to be a difficult conversation sometimes involving payments and at least a lot of work on both the sides of the fence, and for a bunch of people who are otherwise not interested in your companies plight.

Having such a conversation demands one a difficult task, that of making contact and thus finding the contact details of all said parties, that may be an email address or a contact form on their website or whois page, which amounts to a massive task when challenged with sometimes hundreds of webmeisters that may be linking to your site.

Toxic Link Removal

One of the most important tools we have developed here at the Link Auditors is our own in-house outreach tool that not only finds email addresses and sends polite link removal requests, but it also locates contact forms, then actually posts into those forms the same polite message, breaking any Captcha challenge that are placed to prevent the forms being posted to by robots with spam. Evading such technological barriers demands some very smart technology and this is one of the most difficult challenges we have had to overcome when we set out to make such a tool. Not least was there the problem of breaking the Captcha challenges but you wouldn’t believe it but to actually find these images on the page was also extremely difficult from a technical pot of view.

Of course humans can easily distinguish what is a captcha challenge, but if you are looking at a web page and you are a computer program, understanding where on the page it is located so it can then be broken is actually really hard. One might assume it would be easy as the input field may be called captcha or something similar, but again disparate forms have all manor of naming conventions for the input fields that must be understood, some may be just named with numbers and as such the challenge becomes making sense of all the input fields in order to put the message into the message field and the subject in the subject field.

On top of this you sometimes have two such forms on one page, one maybe a search box, then another problem is other unexpected questions that might be asked, what is your message about, etc. and also perplexing is understanding the html code, sometimes these are not conforming to any standards, have poor code and bugs which makes decrypting them even harder to break, while sometimes they will post into new pages with no standard use of relative or absolute paths, meaning that just understanding where it is supposed to be posted to, that is /sendmessage.php or such vs ../ etc. this can take quite some interpretation with many potential pathways having to be considered and much work to make sense of anything that you come across.

Lastly maybe is the issue of cookies and sessions that have to be managed effectively to emulate the path any human user would take. It is no wonder why we are the only company in the world in this sector who can boost to offer such a feature and it is by this measure which any company should be judged in this space. If they cannot remove the toxic links, they are wasting someone else’s time and money!

Lets take a look at how this works
Screen Shot 2016-01-11 at 14.46.10

Contact Us to get your toxic links found and removed TODAY!

Notice: Constant _BBC_PAGE_NAME already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 81 Notice: Constant _BBCLONE_DIR already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 82 Notice: Constant COUNTER already defined in /home/linkaudit/public_html/blog/wp-content/themes/twentyseventeen/template-parts/post/content.php on line 83
Insights by Steven Carroll. Steven is the CTO for The Link Auditors as well as VolatilityNow, both tools were developed to reverse engineer Google penalties and understand what changes were taking place inside the blackbox of Google. You can SUBSCRIBE FOR EMAIL UPDATES HERE: or follow thelinkauditors on Twitter.
Subscribe for updates

What Went Wrong Here?

The TOP SEO Company in London has JUST been hit with a Google penalty!

Here at The Link Auditors we help companies understand why they are loosing positions in Google and we work on removing penalties for our clients. Often SEO companies make mistakes for their clients and evidently sometimes even negativly effecting themselves when they use low quality link resources and apply them in unnatural ways.

Case Study

We have provided a deep analysis of the back links of UWPGROUP.CO.UK and isolated their toxic and low quality links to understand better the general types of links they have which were once valued by Google while they enjoyed Google’s top spot; and also to understand why those links now have now resulted in the company being penalized by Google.

So what did This Top SEO Company Do Wrong

Have a look at the LINK AUDIT WE DID ON THEM, let me know what you think?

We can also show the drop was NOT part of some industry-wide Google update by looking at the general activity for that day that this was an isolated drop, in fact we can also show that this website droped more than 100k other sites that day:

Google volatility

OK lets take a look at some of the issues found by our Link Audit:

And according to the number of site-wide links they have we can also see issues there: