robots.txtseoweb crawlinggoogle
robots.txt Guide: How to Control Search Engine Crawling
Learn how to write a robots.txt file to control what search engines can and cannot crawl on your website.
June 25, 2024ยท5 min read
What is robots.txt?
robots.txt is a text file at the root of your website that tells search engine crawlers which pages they can and cannot access. It's part of the Robots Exclusion Standard.
Location: https://yourdomain.com/robots.txt
Basic Syntax
User-agent: * # Applies to all crawlers
Disallow: /admin/ # Block this path
Allow: / # Allow everything else
Sitemap: https://example.com/sitemap.xml
Common Rules
Block all crawlers
User-agent: *
Disallow: /
Block a specific crawler
User-agent: AhrefsBot
Disallow: /
Block specific paths
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Allow everything
User-agent: *
Disallow:
robots.txt vs noindex
| robots.txt Disallow | Noindex meta tag | |
|---|---|---|
| Prevents crawling | โ | โ |
| Prevents indexing | โ | โ |
| For JS rendering | โ | โ |
Important: robots.txt is Advisory
Search engines should respect robots.txt but aren't required to. For sensitive content, use proper authentication - not just robots.txt.
Generate your robots.txt with our Robots.txt Generator.