Interconnected Networks & Bipartite Graphs – Penguin4

As Google continues to improve upon their ability to detect manipulation of the link graph and henceforth penalise yet more subsets of the Internet, this recurring theme over the years has led to an increased interest by those reliant on search traffic to try and understand more specifically the nuances that may affect their rankings and subsequently their businesses.

We noticed this latest (Penguin4) update was better able to understand and target websites that have unnatural linking patterns which can be detected by looking for domains that are linking to a target which also commonly interlink to the same other resources. I.e. where some domains link out to many targets that other domains replicate or have similar outbound links too. By looking at the correlations between different domains through their outbound links and applying further filters to that data, it is possible to discover link networks such as directory networks, PBN (private blog networks) and other link schemes where manipulation of the link graph has ‘more likely’ been an intentional aspect of the linking practices.

Interconnected Network

Following Google’s Penguin:4.0 update 23/09/2016, I hypothesised that one specific improvement Google has made to the algorithm was their ability to use Bipartite Graphs (dense networks) through co-citation combined with other manipulation signals to detect Isolated networks that were more likely created to manipulate the search results.

Naturally Google has a patent on this concept dating back to 2007 but what we are observing with Penguin4 suggests that Google’s implementation of this concept has been recently improved upon and is now better suited to distinguish between naturally occurring dense networks Vs isolated manipulative networks.
This aspect of manipulation detection via dense network comparison is not widely understood within the SEO community and the discussion on the concept is thin on the ground, least for two very interesting resources that have covered these concepts in more detail, one being Bill Slawski of SEOBYTHESEA who talked about this area and has done so ever since Google published their patent which Bill covered in this article, and this excellent article by   which goes into even more detail about link graphs and how co-citation works to provide insights into relevancy, which he expanded upon in his research paper titled Relationships in Large-Scale Graph Computing. I am sure there are some more relevant articles from the SEO community that I have missed and I will continue to post any relevant information I come across retrospectively in the footer of this article.

Outside of Google there were (we have now developed one)  no commercially available tools which examined the link graph of a domain and allowed you to visualise or extract these dense interconnected networks, so over the last three months I developed a tool for The Link Auditors known internally as the Interconnected Network Tool, which is now offered as one part of several other link auditing tools that make up our link audit package. Developing this tool has already been an incredible journey, thinking through concepts and methods needed to extrapolate this data and make sense of it in the same way as previous academics and researchers within organisations such as Google has been extremely stimulating and allowed me to learn many important lessons and concepts relevant to link graphs and SEO manipulation, as I develop the overlaying filtering and sorting algorithms further these insights continue to evolve.

Our Interconnected Network Tool


STEP 6 Interconnected Networks
NEW TOOL Penguin4 launched 23/06/2016 targeted domains that have interconnected networks via co-citation and bipartite graph analysis. This new tool traces these dense networks and applies further filtering to weed out natural authorities from manipulative link networks.

Visualising the link graph of your domain is not just beautiful but also incredibly useful for both competitive analysis and to understand why Google both promote and penalise domains based on link graph anomalies. Comparing different domains by visually looking at their link graphs can provide huge insights that are otherwise hidden. Uncovering relationships between entities is almost impossible using any other method and having large data sets allows detection of subtle anomalies that can only be noticed with large data, no doubt removing manipulation based on these signals can have a huge impact on search results.  

Clearly this is where Google has been focusing for a long time (especially noticeable in the last Penguin4 update) and given the lack of insight about these concepts within the general SEO community at large, it is no wonder why the chasm has opened between the SEO community and Google. With most people failing to grasp what is going on and many others having given up entirely trying to manipulate Google with SEO; while Google seemingly has become resistant to many forms of manipulation. That’s not to say Google is winning outright and there is still large amounts of manipulation evident for all to see, only that the bar has been raised consistently and most people who formally engaged in this art are without the tools or the insights to understand the science of this endeavour. For every 1 that we see now successfully managing to manipulate Google we can see another 50 have been thwarted. Those getting good results now are either lucky or have a deeper insight into these concepts and can join the art and science together to make manipulation work.

Our Subsequent Findings

We are actively reverse engineering the results and establishing a pattern that is consistent with the assertions made by comparing domains that we know have manipulative links pointing to them. In this case study we are looking at two domains (RED & BLUE) both differ in the way their links are intertwined, the BLUE domain has hardly any interconnected links whereas the RED domain has many of its links shared with other interconnected domains. Both domains have purely manipulative links and yet with the launch of Penguin 4 update the RED domain is immediately negatively impacted whereas the BLUE domain steadily rises to the first position in for a highly profitable search term!

BLUE domain interconnected links:

propertyThis image clearly shows only a handful of interconnected links despite having a relatively large link profile while the RED domain (below) has many more Interconnected links and a significant section of an isolated network.

Isolated Network

This image (below) shows the Isolated section of the RED domain whereby the manipulative links that are isolated have been extracted only, these links would be considered to be linking in a way that is statistically unlikely given the overall signals which can be obtained by comparing how these differ from other more natural hubs and nodes.

I am slightly surprised that it has taken me us in the community so long to emulate what Google is doing in these areas, while Bill Slawski has been almost alone in deciphering what they are doing from their patents and maybe only a handful of us have been trying to decode Google by building tools to unravel their inner workings. These challenges are fairly difficult to do given the hurdles that need to be overcome, having the history, knowledge, insight and resources to develop such tools probably falls close to only a handful of people or organisations but I would have thought some of the bigger outfits would have been all over these concepts and built tools specifically targeting such areas. Clearly though having unlimited resources to employ academics and smart programmers, pays off over time and that is what is making the difference between winning and loosing this battle.

Google claim that people who try to manipulate their search positions hurt everyone who uses the search engine and that much of the manipulation is pure spam, or relates to porn or scams. This premise is used to justify the continuos efforts to weed out artificial linking practices that aim to manipulate the search results. Yet there is a legitimate interest for anyone who has a website to be found for relevant search terms relating to their website and that drive is ultimately what has led to the commercial success of Google’s Adwords program I.e. paid placements,  and the success of both Google and Alphabet INC. Maybe if Google were only penalising Porn sites, scam sites and spammy sites then there would be some understanding for Google’s sanctimonious, righteous crusade but while the majority of sites that are being penalised by these updates are normal small business owners who are simply trying to improve their traffic then it is very difficult to maintain the facade that these penalties are not someway driven by increased profits for the giant pay to play information highway.


Further links to relevant resources and discussion from the SEO community:

Search Quality: The Link Graph Theory by Dan Petrovic
Relationships in Large-Scale Graph Computing Article by Dan Petrovic
Google Patent on Web Spam, Doorway Pages, and Manipulative Articles 11/27/2007 by Bill Slawski

Further links to relevant resources and discussion from the academic community:

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg