RSS
October 14, 2008 | ryan | Comments 0

URL Canonicalization and SEO: The Ultimate Guide

If you're new to our blog, you may want to subscribe to our RSS feed SEO Blog Feed. StumbleUpon visitors please Rate Us. Thanks for visiting!

Canonicalization can be a confusing area for webmasters, so let’s take a look at what it is, and ways to avoid it causing problems.

What Is Canonicalization?

Canonicalization is the process by which URLs are standardized. For example, www.acme.com and www.acme.com/ are treated as the same page, even though the syntax of the URL is different.

Why Is Canonicalization An Issue For ?

Problems can occur when the search engine doesn’t normalize URLs properly.

For example, a search engine might see http://www.acme.com and http://aceme.com as different pages. In this instance, the search engine has the host names confused.

Why Is This a Problem?

If the search engines sees a page as being published at many separate URLs, the search engine may rank your pages lower than they would otherwise, or not rank them at all.

Canonicalization issues can split link juice between pages if people link to variants of the URL. Not only does this affect rank (less PageRank = lower rank), but it can also affect crawl depth (if PageRank is spent on duplicate content it is not being spent getting other unique content indexed).

To appreciate what a dramatic effect canonicalization issues can have on search traffic look at the following example, and notice that for the given example proper canonicalization increased traffic for that keyword by 300%

Link Equity Google Ranking Position % of Search Traffic Daily Traffic Volume Traffic Increase
split 1 60% 8 3% 50 -
split 2 40% 15, filtered = 0 0% 0 -
canonical 100% 2 12% 200 300%

What Conditions Can Cause This Problem?

Google

There are various conditions, but the following are amongst the most common:

  • Different host names i.e. www.acme.com vs acme.com
  • Redirects pointing to different URLs i.e. 302 used inappropriately
  • Forwarding multiple URLs to the same content, and/or publishing the same content on multiple domains
  • Improperly configured dynamic URLs i.e. any url rewriting based on changing conditions
  • Two index pages appearing in the same location i.e. Index.htm vs Index.html
  • Different protocols i.e. https://www vs http://www
  • Multiple slashes in the filepath i.e. www.acme.com/ vs www.acme.com//
  • Scripts that generate alternate URLs for the same content i.e. some blogging and forum software, ecommerce software that adds tracking URLs
  • Port numbers in the domain name i.e. acme.com/4430 : can sometimes be seen in virtual hosting environments.
  • Capitalization - i.e. www.acme.com/Index.html vs www.acme.com/index.html
  • URLs “built” from the path you take to reach a page i.e. tracking software may incorporate the click path in the URL for statistical purposes.
  • Trailing questions marks, with or without parameters i.e. www.acme.com/? or www.acme.com/?source=cnn (a common tagging strategy amongst ad buys)

How Can I Tell If Canonicalization Issues Are Affecting My Site?

Besides working through the checklist performing a manual check, you can also use Google’s cache date.

Previously, you would have been able to use Google’s supplemental index marker, although Google have recently done away with this feature.

The supplemental index is a secondary index, seperate from Google’s main index. It is a graveyard, of sorts, containing outdated pages, pages with low trust scores, duplicate content, and other erroneous pages. As duplicate pages often reside in the supplemental index, appearing in the supplemental index can be an indicator you may have canonicalization issues, all else being equal.

Before Google removed the supplemental index label, many SEOs noticed that supplemental pages had an old cache date and that cache date is a good proxy for trust. If your page is not indexed frequently, and you think it should be, chances are the page is residing in the supplemental index.

Michael Gray at Wolf-Howl” outlines a method to easily check for this data. In summary, you add a date and unique field to each page, wait a couple of months, then search on this term.

How Can I Avoid Canonicalization Issues?

Good Site Planning

Using good site planning and architecture, from the start, can save you a lot of problems later on. Pick a convention for linking, and stick with it.

Maintain Consistent Linking Conventions

It’s an important point, so I’ll repeat it ;) Always link to www.acme.com, rather than sometimes linking to acme.com/index.htm, and sometimes linking to www.acme.com.

301 Redirect Non-www to www , Or Vice Versa

You can force resolution to one URL only. To do this, you create a 301 redirect.

Here’s a typical 301 redirect script:

RewriteEngine On RewriteCond %{HTTP_HOST} ^seobook.com [NC] RewriteRule ^(.*)$ http://www.seobook.com/$1 [L,R=301]

For a more detailed analysis on how to use redirects, see .htaccess, 301 Redirects & SEO.

READ MORE and FIND GREAT TOOLS AT SEOBook.com: URL Canonicalization: The Missing Manual

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • De.lirio.us
  • Furl
  • Live
  • NewsVine
  • Reddit
  • Simpy
  • Slashdot
  • Spurl
  • StumbleUpon
  • Technorati
  • TwitThis
  • YahooMyWeb
Tags: , , , , ,

Related posts

Entry Information

Filed Under: Link BuildingSEO

Tags:

About the Author:

RSSPost a Comment  |  Trackback URL

By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution.