Technical SEO · AuditJet Glossary
Robots.txt
A text file at a website's root that instructs search engine crawlers which pages or sections of a site should not be crawled.
Robots.txt uses the Robots Exclusion Protocol to communicate with bots via User-agent and Disallow directives. While it prevents crawling, it doesn't prevent indexing — disallowed pages can still be indexed if linked from elsewhere. Critically, robots.txt also controls AI crawler access: GPTBot (ChatGPT), PerplexityBot, and ClaudeBot all respect robots.txt directives.
Related terms
The number of pages Googlebot will crawl on a website within a given timeframe, determined by crawl rate limit and crawl demand.
A file that lists all the URLs on a website to help search engines discover and index pages more efficiently.
OpenAI's web crawler used to collect training data and power ChatGPT's web search capabilities.
Monitor Robots.txt continuously
AuditJet tracks Core Web Vitals on a schedule with revenue impact alerts.
Start Free