Cloudflare AI Bot Tools Guide: Block, Charge, or Allow AI Crawlers?
Cloudflare now offers site owners tools to block AI crawlers, tools to charge AI crawlers, and tools to make AI crawlers’ jobs easier, and they’re all marketed to the same customers in the same dashboard. If that sounds contradictory, it’s because it is. Understanding these tools as a product landscape, rather than a unified strategy, is essential to making good decisions about how your site handles AI traffic.
Here’s what’s available, what each tool actually does, and how to decide which posture fits your business.
The Product Landscape
Cloudflare has released five distinct AI-related features over the past two years. Each one implies a different relationship between your site and AI systems, so let’s go over each of them first, and then we’ll talk strategy.
AI Crawl Control (General Availability)
Formerly called AI Audit, this is your observability and policy enforcement layer. It identifies which AI crawlers are hitting your site, how often, what they’re requesting, and whether they’re following your robots.txt directives. You can set per-crawler allow or block rules, and on paid plans you get extended analytics with referrer data, bandwidth consumption breakdowns, URI pattern grouping, and status code distribution.
On the free plan, detection relies on user-agent string matching; it catches the well-behaved bots that identify themselves honestly. Paid plans add Cloudflare’s Bot Management detection ID, which uses machine learning and behavioral fingerprinting to catch bots that don’t self-identify. Free plans also limit the Metrics tab to the past 24 hours, while paid plans give you longer date ranges.
This is the foundation. Regardless of which strategic posture you choose, you need observability first.
Block AI Bots (All Plans)
This is a one-click toggle that blocks all known AI crawlers using Cloudflare’s managed rules. Since July 1, 2025 (— )what Cloudflare called “Content Independence Day”) this is enabled by default on all newly created domains. The toggle blocks verified AI crawlers and includes signatures for bots that don’t follow rules or attempt to disguise themselves.
You have three options: don’t block (off), block on all pages, or block only on hostnames with ads. If your domain was created before July 2025, your default may still be wide open. If it was created after, you may be blocking crawlers you’d actually want to allow. Either way, you should double check your settings.
AI Labyrinth
AI Labyrinth is a defensive tool that feeds bad bots into a tarpit of AI-generated content designed to waste their resources. Instead of serving a simple block page, Cloudflare generates plausible-looking but meaningless content that keeps the crawler busy, consuming its compute budget on garbage. This targets crawlers that ignore robots.txt and other policy signals.
Pay Per Crawl (Closed Beta)
This allows site owners to set a per-request fee for AI crawler access. When a verified AI crawler requests content, it must either pay (receiving a 200 OK response) or get a 402 Payment Required status code with pricing details and licensing contact information, and Cloudflare acts as the merchant of record. This is currently limited to Enterprise customers and beta participants.
However, even without pay-per-crawl access, any paid Cloudflare customer can now send customizable 402 response codes when blocking crawlers effectively telling the bot operator “we’re open for business, but let’s talk terms first” rather than just slamming the door.
Markdown for Agents (Beta, Pro+)
Markdown for Agents is Cloudflare’s newest addition. It automatically converts your HTML to Markdown when AI crawlers request it via the Accept: text/markdown header, reducing token overhead by up to 80%. I’ve written a detailed critique of this feature covering the cloaking vulnerability, the default content signals, and why the evidence for enabling it is thin.
The Strategic Contradiction
These tools pull in opposite directions. AI Labyrinth and Block AI Bots restrict access and punish non-compliant crawlers; Pay Per Crawl introduces friction and monetization; and Markdown for Agents removes friction and makes consumption easier and cheaper for the same bots the other tools are designed to control.
Now to be clear, Cloudflare isn’t confused or anything (obviously); they’re just building a platform that supports multiple postures so that every site owner, regardless of strategy, depends (or can depend on) on Cloudflare’s infrastructure. That’s good business for Cloudflare, but it means you can’t just enable everything and expect coherent results…you need to choose a posture.
Three Postures for Site Owners
Posture A: Maximize AI Visibility
You want AI systems to discover and cite your content. Your business model benefits from being referenced in AI-generated answers because AI-mediated discovery could drive conversions.
In this posture, you would allow search-oriented AI crawlers (ChatGPT-User, PerplexityBot, OAI-SearchBot), ensure your content is server-rendered and semantically structured, implement JSON-LD schema for your key content types, and potentially enable Markdown for Agents if the cloaking issue gets resolved.
You’d still want to block training-only crawlers that scrape for model building without sending referral traffic (Bytespider, CCBot), and monitor crawl-to-referral ratios to make sure the bots you allow are actually delivering value.
The risk? AI systems can summarize your content without linking back, so you stay invisible even as they rely on your data. Cloudflare’s own numbers show crawl-to-referral ratios of 1,700:1 for some AI operators. You’re betting that visibility in AI-generated answers translates to business value. That bet may pay off, but if you want to be smart, you need to track it.
Posture B: Restrict and Monetize
Your content has clear value and you’d rather be compensated for AI access than give it away. Maybe your revenue model is already under pressure from AI systems summarizing your content without driving traffic.
In this posture, you’d (want to) block most AI crawlers by default, explore pay-per-crawl or custom 402 responses with licensing contact information, selectively allow crawlers that send meaningful referral traffic, and potentially negotiate direct licensing deals with AI operators whose products depend on your data.
The risk? You could become invisible to AI-mediated discovery entirely. If your competitors allow crawlers and you don’t, AI systems will simply cite them instead. This is defensible if your content is truly unique and high-value, but for substitutable content (meaning your site has content that they can get from somewhere else just as easily), it’s a losing play.
Posture C: Selective Access
This would be the pragmatic middle ground. You allow AI crawlers that provide value through search visibility or referral traffic, while blocking crawlers whose primary purpose is training data extraction with nothing returned to you.
This is probably the right posture for most small and medium business sites. It requires understanding which crawlers do what. In broad terms, search-oriented bots (ChatGPT-User, PerplexityBot, OAI-SearchBot) are more likely to send referral traffic. Training-oriented bots (GPTBot when used for training, Bytespider, CCBot, Meta-ExternalAgent) primarily extract content for model building. Google’s crawlers are a special case — Googlebot serves both traditional search and AI features, making it impractical to block.
The AI Crawl Control dashboard is where you implement this. Review each crawler’s activity and referral patterns, then set per-crawler policies accordingly.
The Biggest Mistake: Passive Defaults
The worst posture is no posture at all and that’s where most sites currently sit. Cloudflare data from mid-2024 found that roughly 39% of the top million domains on their network were being crawled by AI bots, but only about 3% had any blocking or throttling in place. That gap has narrowed since the July 2025 default changes, but plenty of older domains are still wide open with no conscious policy.
If your domain predates July 2025, your default is probably “allow everything.” If it postdates July 2025, your default is “block everything.” Neither may be what you actually want.
The cost of passivity is real. Some sites see 20–40% of their traffic from AI bots. As mentioned above, Cloudflare’s data shows crawl-to-referral ratios ranging from 1,700:1 to 73,000:1 meaning tens of thousands of page requests for every single click sent back. That’s bandwidth you’re paying for, server resources being consumed, analytics being polluted, etc, all without conscious authorization.
The Pay-Per-Crawl Reality Check
Pay-per-crawl is conceptually attractive. After all, if AI systems rely on your content, you should be compensated. But…the odds of it becoming meaningful revenue for most sites are low.
Advertising monetizes user attention, but pay-per-crawl monetizes content extraction, and only high-demand, high-authority, high-uniqueness content can command meaningful per-crawl fees: financial data, proprietary research, timely journalism, niche technical datasets. A typical small business website or blog is easily substitutable; AI systems can pull similar information elsewhere.
The precedents for AI companies paying for content involve major publishers: Reddit’s $60M+ annual deals, licensing agreements with news organizations, etc. It’s important to recognize that these are negotiated relationships between companies with leverage. Pay-per-crawl attempts to standardize this at scale, but the per-request fees are described as “pennies” (which is far below advertising CPMs) and AI companies have every incentive to route around sites that charge by scraping free alternatives or negotiating bulk deals only with large players.
So unfortunately, for most site owners pay-per-crawl is defensive leverage, not a revenue strategy. It may reduce free exploitation and create negotiating power, but expecting it to rival ad revenue is unrealistic outside the top tier of premium publishers. The customizable 402 response is arguably more immediately useful because it opens a communication channel with AI operators, signaling that you’re open to commercial terms without requiring the full pay-per-crawl infrastructure.
How to Set This Up
Here’s the practical sequence:
Step 1: Check your current state. Log into the Cloudflare dashboard, select your domain, and navigate to AI Crawl Control. Look at the Crawlers tab and see who’s visiting. Check the Metrics tab for volume, bandwidth, and referral patterns; look at the Robots.txt tab to see who’s violating your directives.
Step 2: Decide your posture. Based on your business model, content type, and what the data shows, decide whether you want to maximize visibility, restrict and monetize, or take the selective approach. Most small business sites should start with selective access.
Step 3: Set per-crawler policies. In the Crawlers tab, set allow or block for each identified AI crawler. For crawlers you block, consider using a 402 response with licensing contact information rather than a bare 403.
Step 4: Review your Block AI Bots toggle. Under Security > Bots, check whether the one-click AI blocking is on or off and whether the setting reflects your intended posture.
Step 5: Update robots.txt. Your Cloudflare policies and your robots.txt should align. If you block a crawler in the dashboard but allow it in robots.txt (or vice versa), you’re sending conflicting signals. The dashboard enforcement takes precedence at the proxy level, but robots.txt is the standard that compliant crawlers check first.
Step 6: Revisit quarterly. The AI crawling landscape is moving fast. New bots appear, existing bots change behavior, and the referral value equation shifts as AI products evolve. Set a calendar reminder to review your AI Crawl Control dashboard and adjust policy as needed.
The tools exist so make sure you’re using them deliberately instead of letting defaults make your decisions for you.
Related Reading
-
Don't Enable Cloudflare's Markdown for Agents Yet
The newest tool in Cloudflare's AI suite has an unresolved cloaking vulnerability and defaults that broadcast training consent. Read this before toggling it on. -
What Is GEO? The Next Evolution of SEO in the Age of AI
Learn how Generative Engine Optimization (GEO) helps your business stay visible in AI-driven search results -
Why is AI recommending your competitors—but not you—even though your website is live?
The 2025 visibility killer most businesses miss: AI has no idea which services actually belong to you… unless you force it to learn with GEO.