Canonical URLs, SEO and .htaccess
By Tino Triste in Search Engine Optimisation on Thursday, July 17, 2008 @ 08:58
It is very common for web sites to have multiple urls pointing to the same page. This happens especially to a website's home page and where content management systems generate dynamic urls. It wouldn't be wrong to say that the following urls are all the same:
http://helloworld.comhttp://helloworld.com/index.html
http://www.helloworld.com
http://www.helloworld.com/index.html
In actual fact they are, and typically all the above web addresses would display the same web page. From a SEO perspective that's where the problem lies.
Search engines are aware of this issue and know that a web server could display distinct pages for all the urls above. Thus they would try to pick the best urls in this group, causing you to lose control of how your site is displayed in the SERPs. The Google engineer, Matt Cutts, explains canonical urls in more detail in his blog.
The key to prevent duplicate urls is to be consistent. It is important that you use "/" instead of "index.html" when you link to the home page from other pages within your site. Further, pick the url you prefer (with or without www's) and stick to it for both incoming and internal links.
If your web site is hosted on an Apache server you can use a file called .htaccess to instruct the web server to 301 redirect the canonical urls.
There are two ways of doing this:
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^helloworld.com [NC]
RewriteRule ^(.*)$ http://www. helloworld.com/$1 [L,R=301]
The above rule tells the server if "helloworld.com" is requested to permanently redirect to "http://www.helloworld.com".
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\. helloworld \.com$
RewriteRule (.*) http://www. helloworld.com/$1 [R=301,L]
This rule tells the server if "www.helloworld.com" was NOT requested to redirect to "http://www.helloworld.com".
I personally prefer the second one because it's more flexible and addresses all possible scenarios.



Tino wrote:
I'm not yet sure how to resolve the:
http://www.helloworld.com/somedir VS http://www.helloworld.com/somedir/
I'll look into that later.
The following htaccess code is an extension of Jeffrey's code.
It redirects index.htm OR index.html OR index.php to "/" in the root directory and all sub-folders. :-)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(htm¦html¦php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(htm¦html¦php)$ http://www.helloworld.com/$1 [R=301,L]
Don't forget before you use this on a production site test, re-test and test again!
Ankit wrote:
Keep the Good work up !!! Nice article !
BTW:
http://helloworld.com/domedir
http://helloworld.com/somedir/
http://www.helloworld.com/somedir
http://www.helloworld.com/somedir/
This is also one of the common example .. even mattcutts was having this prob in his blog ;)
Tino wrote:
Hi Jeffrey,
The code samples in this post only covered www versus non-www canonical issues.
Thanks for posting this htaccess rule, I've tried to create rules to fix the default document issue but it fell on an infinite loop.
I'll test your version today! :-)
Thanks for your comments.
Tino
Jeffrey Olchovy wrote:
The code you wrote still doesn't solve the issue of your index or default page getting the 301 to your canonical domain. Try it out in your browser. Or not. Check your analytics, I just visited that page explicitly and it will show up on your page loads rather than a visit to your canonical domain.
In order to make this redirect work, append the following two lines of code to your .htaccess file:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.php\ HTTP/
RewriteRule ^(.*)index\.php$ /$1 [R=301,L]
Be sure to replace index with whatever file name you use for your default document, and change the file extension from php to whatever you use.