Robots.txt is a file that lives on your website and tells search engines what pages and files they can and can’t crawl. Whenever a search engine crawler comes along to your domain, they will check your robots.txt file and crawl pages that they are allowed to.
What is a robots.txt file used for?
You can use a robots.txt file to prevent search engines from crawling pages and files that you do not want showing up in search results. By preventing pages from being crawled, search engines are not able to index them which means that they can’t rank.
Things can get a little complicated here because if a page is already indexed and is then blocked by your robots.txt file, it may remain indexed for a while before potentially being removed from search results.
But overall, robots.txt is generally a good, solid method for preventing crawling of pages and can play a part in preventing indexing too. One use case is when you have a new website being launched and it’s live on a test server. You don’t want search engines to crawl this and end up indexing two versions of your website, so the robots.txt file is one tool that can prevent this.
You can also use a robots.txt file to include references to your XML sitemaps, so that when search engines crawl your website, they can also take a look at your sitemaps too.
Why would I monitor my robots.txt file?
Your robots.txt file is pretty powerful and search engines will follow the rules in it pretty strictly. So if, for example, someone accidentally adds a rule that tells search engines to not crawl your entire website, you would want to know!
If this does happen and you’re not aware of this, it could be days or even weeks before you notice and during that time, you’re probably going to have lost a lot of organic search traffic.
It sounds extreme, but most SEOs will have seen this happen before and it’s surprisingly common.
What if I don’t have a robots.txt file?
Most platforms will generate a robots.txt file by default and apply some standard rules, but this isn’t always the case. If you don’t have a robots.txt file live, then search engines will assume that every page and file on your website can be crawled and indexed. This may not be a problem for smaller websites, but for larger ones, it can present problems and use up a lot of crawl resources.
This is why it’s also important to monitor because if your robots.txt is removed for some reason, any rules that you set for search engines to follow will no longer be valid.