How to block SE robots access to a site folder?

Started by jeyavinoth, Jul 09, 2022, 01:45 AM

Previous topic - Next topic

jeyavinothTopic starter

Good day.
A few years ago, I was warned that robots could access some folders on server, despite the prohibition in robots.txt.
Now there is real evidence for this. Google Webmaster directly writes that he indexed the page, despite the prohibition of indexing.

I need to close the robots .css folder.
How to block access to the /css folder for robots, but so that when viewing the page through a browser, its content is displayed correctly?
Thanks in advance for your help.
  •  

lizatailor23

This article shows how to block access when IPs are known or who (in this case a robot) comes in:
https://www.inmotionhosting.com/support/website/block-unwanted-users-from-your-site-using-htaccess/

List the bots, call them bad users, and deny them access.
If the bot starts to be called not its native user agent, then it will get access.
  •  
    The following users thanked this post: Sevad

brandsmith

All this is done through directives in the robots.txt file, which must be located in the root of the site.
Here's how you can prevent bots from processing a certain directory:

User-agent: *
Disallow: /catalog_1/
In this case, the full (absolute) path to this directory will be as follows:

http://mysite.com/catalog_1/
4. In the example below, files (images, webpages, etc.) included in the "premier" directory are prohibited from indexing, as well as those objects whose paths (URLs) after a slash begin with this set of characters:

User-agent: *
Disallow: /premier
That is, the ban will include, say, files with the following absolute links (among them may be web pages):

site.com/premier
site.com/premiers.html;
site.com/premierpro.htm;
site.com/premier-x/file_1.html.
This is because the "*" character is present at the end of the "/premier" directive by default, although it is not specified in reality (read about regular expressions at the beginning of this chapter). In other words, the above entry is absolutely identical to such:

User-agent: *
Disallow: /premier*
  •