robots.txtseoweb crawlinggoogle

robots.txt Guide: How to Control Search Engine Crawling

Learn how to write a robots.txt file to control what search engines can and cannot crawl on your website.

June 25, 2024ยท5 min read

What is robots.txt?

robots.txt is a text file at the root of your website that tells search engine crawlers which pages they can and cannot access. It's part of the Robots Exclusion Standard.

Location: https://yourdomain.com/robots.txt

Basic Syntax

User-agent: *         # Applies to all crawlers
Disallow: /admin/     # Block this path
Allow: /              # Allow everything else

Sitemap: https://example.com/sitemap.xml

Common Rules

Block all crawlers

User-agent: *
Disallow: /

Block a specific crawler

User-agent: AhrefsBot
Disallow: /

Block specific paths

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

Allow everything

User-agent: *
Disallow:

robots.txt vs noindex

robots.txt Disallow Noindex meta tag
Prevents crawling โœ“ โœ—
Prevents indexing โœ— โœ“
For JS rendering โœ— โœ“

Important: robots.txt is Advisory

Search engines should respect robots.txt but aren't required to. For sensitive content, use proper authentication - not just robots.txt.

Generate your robots.txt with our Robots.txt Generator.