Saturday, June 12, 2010

Differences between robot.txt vs .htaccess

The robots.txt file consists of directives to search engine spiders (robots) as to what files and folders you want or do not want to be indexed. However, this will not necessarily prevent spiders from following links into those folders and there are some spiders that do not respect the robots.txt file (all of the major search engines do but there are still quite a few unscrupulous bots to worry about). Additionally, the use of robots.txt directives does not prevent human visitors from accessing those folders and directories if they know they are there (or if they're just hacking their way in via guesses, e.g., looking for index.html or index.php files).

Depending on how you set it up, the .htaccess file, in contrast, actually blocks access to certain files or folders. This applies to both human visitors and bots.