Link Shaping and Bot Herding
September 22, 2008 – 8:27 amby Dave Snyder
My last post on NoFollow and NoIndex was a way for me to:
1) Try to explain an issue that many starting out in this industry do not completely grasp, or understand the importance of. I myself had some misinformation on the topic that some were good enough to set me straight on.
2) Kick up some discussion on the link siloing/link funneling/link juice sculpting/Page Rank Sculpting concepts around the industry.
This post will be dedicated to exploring the latter concept more completely.
Differences in Theory
I think all SEOs believe it is important to control how the search bots view their site.
All quality SEOs would agree this control begins with a quality site and information architecture.
However, no matter how strong your IA is you are likely to give the search bots small gateways and crevices to move into undesired portions of your site, and distribute your link equity to parts of your site that have no use beyond the on-site user.
Remember that the game isn’t about how many pages you can get in the engines. The game is figuring out how to get as much conversion driven traffic to your site as you can. In the end the only numbers that matter have dollar signs in front of them.
In this game to dominate conversion driven traffic control over indexable content and shaping of link equity disbursement is important.
Some search proffesionals think that the use of the rel=”NoFollow” attribute alone is enough to shape their link equity, and control indexable pages. Others think that site or even page level robot control is enough to guide their strategy.
From Site Wide to Granular
The control of bots, as discussed in part by my previous post, can happen on one of three levels.
You can control bots on the site wide level through your robots.txt file, you can control them on the page level through the Meta Robots tag, or down to the link level through HTML attributes. (Another option is the x-robots-tag I discuss below, and which is really deserving of its own post.)
For maximum effect on bot control and link equity distribution, you should utilize all three. However, utilizing all of these concepts, and making sure they all agree can be tricky. (I give an example of how to utlize all three later in this post)
Here is a full breakdown of all of the REP capabilities.
Utilizing the REP to Control Link Equity
As stated above, there are two basic schools of thought here.
One is that by utilizing the the NoFollow Attribute on the link level you can help sculpt your PageRank or link juice on site.
The second is that by utilizing robots tags and you can shape this same equity and control bots on a larger scale.
Liken the attribute level to carving with a scalpel, and utilizing the site and page level as carving with a butcher knife.
The Difference in These Theories
The link level NoFollow attribute was designed by Matt Cutts and Google in conjunction with the two other major engines as a spam deterrent.
Cutts later came out with statements that utilizing this attribute was an acceptable way to help distribute your PageRank within your site.
In my opinion, there is a problem here.
On one hand the attribute was designed to effect outbound links. On the other we have Cutts telling us it is suitable for the distribution and flow of internal page rank.
Before the NoFollow attribute this concept was already in play in the form of strategically used Javascript, and NoFollow on a page level through Meta robots tags.
By utilizing the robots tag you can help shape your equity on a page level. By utilizing a NoIndex tag you are telling the bots that they can crawl your page and even follow links, but not index that page.
However as Matt Cutts pointed out,”A NoIndex page can accumulate PageRank, because the links are still followed outwards from a NoIndex page.”
So even if you use robots.txt and Meta NoIndex tags you are likely to still pass equity.
Some Issues with Each
There are issues with each of these strategies.
First by utilizing the NoFollow attribute aggressively you are signaling to the engines that your site is being actively optimized. Some believe this draws unneeded attention to yourself and your site.
Secondly, in terms of the NoFollow attribute, there is some disagreement about how much the engines actually follow this rule for in-site links. Also remember the the NoFollow does not literally mean, “don’t follow this link”, that page will still be indexed.
Page level scultping takes place when you want to limit the amount of equity a page is receiving through the Meta robots tag. You pages linked to from a page out of the linear linking equaiton. The problem here is the blunt nature of the shaping does not allow for nuance.
The Perfect Plan
Well I mostly say that because its my plan.
In the end you need to follow the route that is best for your site.
But here is how I am currently approaching the concept of bot and link equity control:
1) Create a quality site and information architecture
Here is a pretty good reference by Adam Audette on SEO Information Architecture.
2) For entire directories or a grouping of similar pages I utlize the robots.txt file to disallow the crawl.
Robotstxt.org can give you more information on your REP .txt files.
3) For individual pages I want to keep out of the index, I add the Meta robots NoIndex tag to them. These cannot be the same pages as referenced in the robots.txt file, as those pages will follow the REP at the higher level. However, I will still cap any possible links from indexed and link juice passing pages with a rel=”NoFollow.”
Again NoArchive.net has an amazing reference to all Meta Robot tags.
4)For deep pages I may want to keep from sending equity out or recycling equity I use the Meta Robots NoFollow.
5) And in the rare case that you deal with outbound links to good information, that for whatever reason you do not want to pass equity to, at that point I would use rel=”NoFollow.” But I cannot think of a case where I would utilize them for this, and if you start utilizing this to often you may see repercussions in your own link portfolio from those that link back to you.
A Visual Representation
Here we will look at a visual representation for a Honey companies website.
This company has the ability to buy products, articles on honey, and a dynamic directory of retailers on its site.
PC = Product Category
AC = Articles Category

A Breakdown of the Strategy

The pages represented in red have been cut off for search bots.
In the case of the retailer directory, I have used a robots.txt file. Some would option to use Meta Robots Noindex,NoFollow on each of these pages, but if you are dealing with a large number of pages robots.txt might be the way to go. I would also want to utilize a rel=”NoFollow” attribute on the link from the home page to the beginning of the directory to limit equity disbursement.
Another option for a large amount of pages is utilizing the x-robots tag in your HTTP header. The x-robots-tag, like the robots.txt file can allow you to make site wide changes to how robots crawl you site, while also having indexer directives similar to the page level Meta robots tag.
Some great references on the use of X- Robots include
- Playing with the X-Robots-Tag HTTP Header - by Joost de Valk
- X-Robots Tag NoArchive Examples - by NoArchive.net
- Managing Robot’s Access To Your Website - Vanessa Fox

I did utilize the Meta NoIndex, NoFollow to disallow the indexing of the Privacy page, and to limit equity from spreading from that page. Again I would also utilize a rel=”NoFollow” on the link from the home page to the Privacy section.

The pages in blue are pages I have allowed the robots to freely access, index, and pass link equity to and from.
You would utilize sound SEO tatics such as SEO siloing in your Product Category and Article Category sections.

The pages in Green coming off of the Product Categories are actual product pages. The green represents that the have been allowed to be indexed, but I have put a NoFollow tag in the Meta information to keep any strange pages, such as a shopping cart, from being followed via links and equity being passed.
Again the x-robots-tag could be an option here.

The pages in yellow, coming off of the Articles Categories, are where I would utilize granular link level attributes to cut off outbound links that I might not want to pass an equity to or be associated with. Again I am not sure I would utilize this anywhere on a planned out site, but on a clean up job it might be needed.
Just another method, Just another theory
In the end, this is just another answer to the conundrum of bot control and link equity disbersement.
However, I think I have pointed out that simply using granular link level NoFollow link attributes is not the answer. It is not what the attribute was designed for, and in the end it does not even keep a bot from “following” a link.
In the case above, we have a site we have had control of from the beginning. If you are working on a site you have not had control of from its inception you are likely going to need to look at requesting URLs be removed from the SE indexes.
Here are some resources on how to get content removed from the indexes.
All of these concepts should be monitored via Google and Microsoft Webmaster Tools, as well as advanced site querries. A healthy crawl is important for a number of reasons, and should really be one of the foundations of your SEO efforts, and not an after thought.

2 Trackback(s)