Robots.txt

Robots.txt is a simple text file that tells search engines which pages of a website they can visit and which pages they should avoid.

What Is Robots.txt?

Robots.txt is a small text file that lives on your website. It gives rules to search engine robots, also called crawlers, about which pages they can or cannot visit.

Definition

Robots.txt is a standard file stored at the root of a website, for example https://example.com/robots.txt. It uses short commands to allow or block specific user agents, such as Googlebot, from crawling certain parts of the site.

Why Robots.txt Matters

Robots.txt is important because it helps you guide search engines. With it you can:

  • Keep private or unfinished pages out of search results.
  • Save crawl budget, so robots focus on your most important pages.
  • Avoid indexing duplicate or thin content that could hurt SEO.
  • Control access to large files or folders that do not need crawling.

How Robots.txt Works

When a search engine visits your site, it looks for the robots.txt file first. It reads the rules inside, then decides which URLs it is allowed to crawl.

The file is written with simple lines like:

  • User-agent: tells which crawler the rule is for, for example User-agent: * means all robots.
  • Disallow: tells robots not to crawl a folder or page.
  • Allow: tells robots they can crawl a folder or page.
  • Sitemap: gives the url of your XML sitemap.

Important note, robots.txt controls crawling, not always indexing. If another site links to a blocked page, it may still appear in results without content. For full blocking from search, you should use noindex tags or other methods.

Robots.txt vs Related Terms

  • Robots.txt vs meta robots tag
    Robots.txt controls crawling at the site or folder level. A meta robots tag is placed inside a page and controls indexing and crawling for that single page.
  • Robots.txt vs XML sitemap
    Robots.txt tells robots where not to go. An XML sitemap tells robots which pages you want them to find and crawl.

Example of Robots.txt

Here is a simple robots.txt example for a website:

User-agent: *
Disallow: /admin/
Disallow: /cart/
Allow: /

Sitemap: https://example.com/sitemap.xml

What it means:

  • All crawlers are targeted because of User-agent: *.
  • The /admin/ and /cart/ folders must not be crawled.
  • All other pages are allowed.
  • The sitemap is at the given url to help robots find important pages.

FAQs

Q. Where should I put my robots.txt file?
A. Place it in the root folder of your domain, so it is reachable at https://yourdomain.com/robots.txt. Search engines will look there automatically.

Q. What happens if I have no robots.txt file?
A. If there is no robots.txt, search engines usually crawl any page they can find, as long as there is no other rule blocking them.

Q. Can robots.txt remove pages from Google?
A. Not always. Robots.txt can stop crawling, but pages might still appear if they are linked from other sites. Use noindex tags or removal tools to fully remove pages.

Q. Does robots.txt stop people from viewing a page?
A. No. It only gives instructions to honest crawlers. Visitors can still open the url in their browser if they know it.

Q. Should I block my whole site with robots.txt?
A. Only if the site is private or in development. For normal live sites, blocking the whole site will prevent search engines from finding and ranking your pages.

Written by:

Picture of Team Bluelinks Agency

Team Bluelinks Agency

Posts authored by Team Bluelinks Agency represent official, verified content meticulously crafted using credible and authentic sources by Bluelinks Agency LLC. To learn more about the talented contributors behind our work, visit the Team section on our website.
Stay Updated, Subscribe Free