|
Home » Other Tips » Hosting Issues » robots.txt robots.txtThere is a lot of confusion about this little text file. Maybe you've seen requests for it in your log files and wondered what it is. Maybe somebody told you it was a way to secure your site. Maybe you were told that it helped prevent "bad bots" from visiting your site. Which is it? The robots.txt file is a way for you to tell web crawlers what they are and are not allowed to view. For example, if you do not want your /images/ folder being indexed you can add that folder as an exclusion to your robots.txt file. You do not need a robots.txt file if you want web crawlers to find every part of your site. But you can create an empty text file named robots.txt if you do not want the file not found errors in your logs. The simplest robots.txt file is one that disallows viewing of everything by a robot. User-agent: * This tells every web crawler that visits that they should not visit any pages. In place of the asterick you can also use specific robot names. A couple of notes about this file. One, it must go in the web root. That is, it must be at www.yoursite.com/robots.txt. It cannot be www.yoursite.com/whatever/robots.txt. It will not be found and used. Second, browsers are unaffected by this file. Even if you use the user-agent for IE or FireFox they will still open the files that you disallow. Third, there isn't a requirement for robots to follow these rules; it is more of a suggestion. Many robots, mostly spammers, will ignore your robots.txt file. |
