How to block bad bots with Cloudflare Matt B, April 25, 2024May 7, 2024 Table of Contents IntroWhat are web crawlers?How to block bad bots with Cloudflare using WAF rulesFinding other crawlersFuture-proofingWrapping upIntroA commonly seen issue amongst web server administrators is the excessive number of requests sent in by crawlers such as Bytespider or Claudebot. These are sometimes known as “bad bots” as they often don’t respect robots.txt, and send excessive traffic to your site.Fortunately, you can block bad bots with Cloudflare easily by using a custom WAF rule. This guide will show you how to block these two bots, and how to identify and block other bots as well.What are web crawlers?Web Crawlers are a type of software bot. They attempt to “crawl” the web to find all pages available on a website. This can be very good if for example, the bot belongs to Google and they are crawling your site in order to index it in their SERP (search engine results page).However, crawlers can sometimes be bad if they are scouting for information for purposes other than SERP fufilment. This may be for data harvesting/scraping, or just to be disruptive.Additionally, many bots don’t respect rules in the sites robots.txt file that determine how frequently and where they can crawl. Instead, these bots just send massive amounts of requests to your server, resulting in higher bandwidth costs and often degraded performance, essentially DOS’ing your server.How to block bad bots with Cloudflare using WAF rulesBlocking bots in Cloudflare is very easy. This can be achieved by creating a WAF (web application firewall rule) that blocks requests if the requestor’s User-Agent string contains a specified term/substring.User-Agent strings are small piece of information sent with HTTP/S requests that tell the server which browser and device type is being used. Bots also send User-Agent strings, and this can be helpful in blocking them. In this example, we want to block the Bytespider bot, whose User-Agent looks like the below:Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.1511.1269 Mobile Safari/537.36; BytespiderIn the above User-Agent, we can see the string “Bytespider”, which we can use in our WAF rule to filter. To create this rule, first sign in to Cloudflare and go to your domain settings, then go to Security -> WAF, and click Create RuleGive your rule a title, and then configure the settings as below:The above settings will block crawl requests from Bytespider and Claudebot. If you want to block other bots, you can simply add another condition to the same rule (making sure you select OR instead of AND) and include the unique string from the User-Agent.It’s worth taking care to ensure this is correct as if you put in a rule that isn’t very specific, you may inadvertently end up blocking legitimate requests.Finding other crawlersWhile this guide uses the examples of Bytespider and Claudebot, there are many, many web crawlers out there that you may wish to block to prevent issues with bandwidth and performance.A good way to find these is to view your web traffic logs, either as raw access files or with a tool like AWStats. This will allow you to see how many requests are coming in from each User-Agent.Additionally, a Google search shows several repositories available on GitHub wherein people list the bots that they know to be bad. An example of this can be found here.Future-proofingMuch like spam, bad web bots are an ongoing issue that requires frequent review. New bots are created and released daily, so regardless of how many blocking rules you have in place, you’re likely to find new ones attacking your server on occasion. It’s therefore worth occasionally reviewing your WAF rule to ensure you are blocking any bots that are causing issues.Wrapping upThis guide has shown you how to identify bad bots and how to block them with Cloudflare. If you’ve got any questions or run into any problems, please feel free to leave a comment. Cloudflare