robots.txt Guide: How to Control Search Engine Crawling

What is robots.txt?

robots.txt is a text file at the root of your website that tells search engine crawlers which pages they can and cannot access. It's part of the Robots Exclusion Standard.

Location: https://yourdomain.com/robots.txt

Basic Syntax

User-agent: *         # Applies to all crawlers
Disallow: /admin/     # Block this path
Allow: /              # Allow everything else

Sitemap: https://example.com/sitemap.xml

Common Rules

Block all crawlers

User-agent: *
Disallow: /

Block a specific crawler

User-agent: AhrefsBot
Disallow: /

Block specific paths

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

Allow everything

User-agent: *
Disallow:

robots.txt vs noindex

	robots.txt Disallow	Noindex meta tag
Prevents crawling	✓	✗
Prevents indexing	✗	✓
For JS rendering	✗	✓

Important: robots.txt is Advisory

Search engines should respect robots.txt but aren't required to. For sensitive content, use proper authentication - not just robots.txt.

Generate your robots.txt with our Robots.txt Generator.