Monday, November 3, 2008

Understanding a Robots.txt File

The robots.txt file allows you to control the behavour of web crawlers and spiders that visit your site. Most web crawlers are harmless and simply collect data for various reasons like search engine listings, internet archiving, validating links, security scanning, etc. It's always a good idea to create a robots.txt to tell the crawlers where they can go and where they can not.
A crawler should always follow the "The Robots Exclusion Protocol" and therefore whever it comes to a web site to crawl it, it first checks the robots.txt file.


www.yourdomain.com/robots.txt


Once it has processed the robots.txt file it will then proceed to the rest of your site usually starting at the index file and traversing throughout. There are quite often places on a web site which do not need to be crawled, like the images directory, data directories, etc so these are what you need to place into your robots file.

The "/robots.txt" file is simply a text file, which contains one or more records. A single record looking like this:

User-agent: *
Disallow: /


The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the web site.

A basic tobots.txt example

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/


Allowing a single crawler

User-agent: Google
Disallow:

User-agent: *
Disallow: /


To exclude a single robot

User-agent: BadBot
Disallow: /

0 comments:

Your Ad Here
Reader's kind attention....The articles contained in this blog can be taken from other web sites, as the main intention of this blog is to let people get all sides of the web technologies under the single roof..so if any one finds duplication or copy of your articles in this blog and if you want that to be removed from this ..kindly inform me and i will remove it...alternatively if you want me to link back to your site with the article...that can also be done...

Thanks,
Webnology Blog Administrator
 

blogger templates