Technical SEO · AuditJet Glossary

Robots.txt

A text file at a website's root that instructs search engine crawlers which pages or sections of a site should not be crawled.

Robots.txt uses the Robots Exclusion Protocol to communicate with bots via User-agent and Disallow directives. While it prevents crawling, it doesn't prevent indexing — disallowed pages can still be indexed if linked from elsewhere. Critically, robots.txt also controls AI crawler access: GPTBot (ChatGPT), PerplexityBot, and ClaudeBot all respect robots.txt directives.

Related terms

Crawl Budget

The number of pages Googlebot will crawl on a website within a given timeframe, determined by crawl rate limit and crawl demand.

XML Sitemap

A file that lists all the URLs on a website to help search engines discover and index pages more efficiently.

GPTBot

OpenAI's web crawler used to collect training data and power ChatGPT's web search capabilities.

Monitor Robots.txt continuously

AuditJet tracks Core Web Vitals on a schedule with revenue impact alerts.

Start Free