A robots.txt file is a text file that contains special instructions for search engine bots (crawlers). You may either allow or disallow different search engine to crawl your site or its certain parts.
Well, first of all, you do not want your system and configuration files to appear on search engines for sure. Secondly – you would need to protect your site from unwanted bots, which may significantly increase your site’s server resource consumption by bombing it with their requests.
When a search engine tries to index your site it requests a robots.txt file in the first priority. If it finds this file it reads it to understand which site parts are allowed for it to crawl.
For example, a search engine bot wants to crawl a portion of your site domain.com/homepage.php. It checks if domain.com has robots.txt file, finds it and sees the following directive:
This means that the bot can’t crawl domain.com since Disallow: / means that access is denied and User-agent: * means that it is denied for all bots. So, you may put such robots.txt file if you are not yet ready for your site to appear in search engines.
You may allow or disallow site indexing for single bots as well. Let’s say, you do not need traffic from China, but your site is being visited by Baidu search engine bot heavily. To block it you should add the following line to your robots.txt:
Or let’s imagine you wish to get indexed in Google only. In such case robots.txt should look like this:
If you do not update your site content too frequently, it would be smart to allow crawling to your site only once in a certain period of time:
Adding this example to your robots.txt will allow bots to crawl your site once in 24 hours (crawl delay time should be specified in seconds).
There is a lot of evidence on the web that Google does not fully respect robots.txt directives any longer. Now their support offers to use robots meta tags or HTTP headers for setting some restrictions. While they are effective, they still may require additional time and knowledge for configuring. Here you will find a full article on using meta tags for Google https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
As an alternative, we recommend you to control Google crawlers with the help of Google Webmaster tools. Here you may sign up to this service https://www.google.com/webmasters/tools/ and here is the manual on crawl rate change https://support.google.com/webmasters/answer/48620?hl=en.