Add robots.txt to WordPress Site

Posted on September 4, 2007
Filed Under WordPress Admin, WordPress SEO |

Almost all webmasters know what robots.txt does, but many don’t use it. Why? They don’t believe it makes siginificant difference to the website, and they are lazy to implement it. But if you take a llitle extra effort to put it in, overtime you do gain advantages.A robots.txt file lets you control search engine robots (known as “bots”) behavior on your site. With robots.txt you set restrictions to search engine robots that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.

There are folders and pages in WordPress site that you don’t want search engines to crawl. By using robots.txt, you direct search engines to index content pages only and stop them to crawl unwanted pages. The result is, your site is indexed more efficiently, and search engines don’t waste your bandwidth to crawl unnecessary contents.

It’s easy to create robots.txt. I use a simple one for each of my WordPress site:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: /rss/
Disallow: /feed
Disallow: */feed
Disallow: /comments
Disallow: */comments
Disallow: /category/*/*
Disallow: /*?*
Disallow: /*?
Sitemap: /sitemap.xml

As you see, robots.txt file tells search engines not to crawl Wordpress admin folders such as wp-admin, wp-includes, etc, that makes good sense. Once you create this file, just put it to the site and then no more work with it. It worth the initial effort. The robots.txt file must reside in the root of the domain and must be named “robots.txt”.

If your site sells digital products such as ebooks from a directory like “download”, make sure you disallow index of this sub folder by adding to robots.txt a line as:

Disallow: /download

Some webmasters suggest adding a separate robots.txt to this sub folder with contents like:
User-agent: *
Disallow: /

This does not work because a robots.txt file located in a subdirectory isn’t valid, as bots only check for this file in the root of the domain.

Alternatively you can prevent a subdirectory to be indexed by using Robots Meta tag in each page. The Robots META tag allows HTML authors to indicate to visiting robots if a document may be indexed, or used to harvest more links. No server administrator action is required.

The Rotots Meta is used as:

<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

Then a robot should neither index this document, nor analyse it for links.

More information can be found in Google’s webmasters knowledge-base:
http://www.google.com/support/webmasters/

Comments

Leave a Reply




*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word