Cloudflare makes history by blocking AI queries
I want to share with you an article I found very interesting, which you're reading now. It explains how Cloudflare made history by blocking artificial intelligence (AI) queries to websites—like ours, that is.
If you don't have the full context, don't worry: I'll explain a couple of key things first before diving into the post. By the way, the original article:
https://share.google/6RSdg28Fn7T4bP8on
What is Cloudflare?
Cloudflare is a proxy that can also function as a VPN. It's used by 79.9% of all websites that deliver content on the internet, making it a widespread and essential service.
You may remember that a few years ago, a Cloudflare outage took half the internet offline. This demonstrates the importance of this service in today's web infrastructure.
And what does AI have to do with all this?
For artificial intelligence to work—as is the case with ChatGPT, Gemini, and other models—it needs to be trained with vast amounts of data. And where does that data come from? Well, in many cases, it comes from articles like ours, published on blogs and websites.
Thousands of people, including myself, have spent years painstakingly writing original content to help others resolve their issues. The problem is that these AIs consume our content without asking permission, without compensation, or without acknowledging our work. The AI ends up appropriating something that was created for a human audience, not to fuel business models.
What exactly does Cloudflare block?
The key point of this article is that Cloudflare has begun blocking AI crawlers. What is a crawler? Basically, it's a "robot" that crawls the web looking for content. Google, for example, uses these robots (also called spiders) to index content and rank it in search results.
AIs, for their part, also use these crawlers to gather content and train their models. And they often do so without respecting the basic rules that we, as creators, can establish.
But can't it be avoided with robots.txt?
Yes and no. We can limit access to our site using the robots.txt file, in which we indicate which parts of our website should or should not be crawled. We can also specify what types of bots are allowed. However, this file is only a suggestion. Any malicious crawler can ignore it and do whatever they want.
That's why it's so important that Cloudflare now actively blocks these bots, although the final decision to activate this protection rests with each customer (i.e., each website owner).
Final reflection: what about content creators?
One of the most important phrases in the article says:
“We strongly believe that all content creators and publishers should be compensated when their content is used in training AI models.”
And it makes perfect sense. We're hard at creating content, and it's not fair that third parties use it to train their models without offering any feedback.
I agree to receive announcements of interest about this Blog.
Analyzed article: "Cloudflare makes history by blocking AI queries and lays the groundwork for business transformation"
- Andrés Cruz