Hello,
Is there a way to prevent bot traffic?
I've implemented this snippet:
if ($http_user_agent ~* (Ahrefs|majestic|SemrushBot) ) {
return 403;
}
This works when added to an individual site's configuration, but I want to apply it universally across all sites hosted on my VPS.
In the nginx.conf file, I placed the following in the http block:
server {
if ($http_user_agent ~* (Ahrefs|majestic|SemrushBot) ) {
return 403;
}
}
However, it's not functioning as expected.
The approach in the nginx.conf might not be effective due to the context of the if directive. In Nginx, the if statements can be tricky and may not behave as expected when placed directly in the server block. Instead, consider using the map directive in the http block to create a variable that can be checked later in the server block. This way, you can efficiently manage bot traffic without running into the pitfalls of if statements.
Here's a more effective approach: define a map variable in the http block to identify unwanted user agents. Then, use this variable in a deny directive within your server block. This method is cleaner and performs better, as it avoids the complications of conditional logic in Nginx.
Here's a streamlined approach to implementing a blocklist:
First off, generate a configuration file named "blacklist.conf" where you can specify all the directives for the elements you wish to restrict.
Next, for every site listed in the sites-available directory, you need to integrate the server block like this:
server {
include /etc/nginx/blacklist.conf;
return 301 https://blah-blah;
}
This setup ensures that any requests that match the criteria in your blocklist will be redirected accordingly.
Here's the code snippet that does the trick:
I've created a regex pattern that matches a wide range of known bot user agents, including those from SEO tools like Ahrefs and SEMrush, as well as other malicious bots. If a request matches any of these patterns, I set a variable $bad_useragent to 1, which triggers the bot-blocking logic.
Next, I configure the server to listen for incoming requests on port 443 with SSL and HTTP/2 support. Then, I use an if statement to check if the $bad_useragent variable is set to 1. If it is, I return a 444 error code, effectively blocking the bot from accessing my site.