Robots.txt and Search Engine Bots Management

What is robots.txt file and what is it for?

A robots.txt file is a text file that contains special instructions for search engine bots (crawlers). You may either allow or disallow different search engine to crawl your site or its certain parts.

Why would someone want to disallow a search engines to index your content?

Well, first of all, you do not want your system and configuration files to appear on search engines for sure. Secondly – you would need to protect your site from unwanted bots, which may significantly increase your site’s server resource consumption by bombing it with their requests.

How does it work?

When a search engine tries to index your site it requests a robots.txt file in the first priority. If it finds this file it reads it to understand which site parts are allowed for it to crawl.

For example, a search engine bot wants to crawl a portion of your site It checks if has robots.txt file, finds it and sees the following directive:

User-agent: *

Disallow: /

This means that the bot can’t crawl since Disallow: / means that access is denied and User-agent: * means that it is denied for all bots. So, you may put such robots.txt file if you are not yet ready for your site to appear in search engines.

You may allow or disallow site indexing for single bots as well. Let’s say, you do not need traffic from China, but your site is being visited by Baidu search engine bot heavily. To block it you should add the following line to your robots.txt:

User-agent: Baiduspider

Disallow: /

Or let’s imagine you wish to get indexed in Google only. In such case robots.txt should look like this:

User-agent: Google


User-agent: *

Disallow: /

If you do not update your site content too frequently, it would be smart to allow crawling to your site only once in a certain period of time:

User-agent: *

Crawl-delay: 86400

Adding this example to your robots.txt will allow bots to crawl your site once in 24 hours (crawl delay time should be specified in seconds).

Google Webmaster tools vs. robots.txt

There is a lot of evidence on the web that Google does not fully respect robots.txt directives any longer. Now their support offers to use robots meta tags or HTTP headers for setting some restrictions. While they are effective, they still may require additional time and knowledge for configuring. Here you will find a full article on using meta tags for Google

As an alternative, we recommend you to control Google crawlers with the help of Google Webmaster tools. Here you may sign up to this service and here is the manual on crawl rate change


Was this article helpful?

Yes (9)
No (0)

We're sorry you didn't find this article very helpful. Please help us improve it by leaving your feedback below.