Hosting & Domaining Forum

Hosting & Domaining development => SEO / SEM/ SMO Discussions => Topic started by: jeyavinoth on Jul 09, 2022, 01:45 AM

Title: How to block SE robots access to a website folder?
Post by: jeyavinoth on Jul 09, 2022, 01:45 AM
Hello there.

Some years back, I received a warning that robots could gain entry to specific folders on the server regardless of being prohibited in robots.txt.

Recently, there has been tangible proof of this. The Google Webmaster tool directly states that it indexed the page even though indexing was prohibited.

I now require a solution to close off the robots.css folder. How can I prevent robots from accessing the /css folder while still ensuring that the content is correctly displayed when viewed by a browser?

I appreciate any assistance you can provide. Thank you.
Title: Re: How to block SE robots access to a site folder?
Post by: lizatailor23 on Jul 09, 2022, 02:05 AM
The article discusses a method to prevent access to known IPs or specific users, such as robots. The process involves creating a list of the identified bots and denying them access. However, if the bot attempts to access under a different user agent, it may receive access. The article also provides instructions on how to implement this using htaccess.

https://www.inmotionhosting.com/support/website/block-unwanted-users-from-your-site-using-htaccess/
Title: Re: How to block SE robots access to a site folder?
Post by: brandsmith on Sep 09, 2022, 12:45 PM
To prevent bots from accessing specific directories, directives in the robots.txt file are used.
This file must be located in the root of the website. The first example shows how to block access for bots to a directory called "catalog_1." The second example blocks indexing for a directory named "premier" and any objects whose paths begin with that string.

The asterisk (*) at the end of the "/premier" directive is included by default, meaning it doesn't need to be explicitly stated. Essentially, the two examples are identical. It's important to note that regular expressions should be considered when creating these rules.
Title: Re: How to block SE robots access to a website folder?
Post by: icellular01 on Jul 25, 2023, 02:57 AM
To prevent robots from accessing the /css folder while still allowing the content to be correctly displayed when viewed by a browser, you can make use of the Robots Exclusion Protocol and modify your robots.txt file. Here's how:

1. Create a robots.txt file in the root directory of your website if you haven't already done so.
2. Open the robots.txt file and add the following lines:

```
User-agent: *
Disallow: /css/
```

These lines instruct all web robots (User-agent: *) to avoid accessing the /css/ folder.

3. Save the robots.txt file and upload it to the root directory of your server.

By doing this, you are explicitly disallowing all web robots from accessing the /css/ folder. Browsers, on the other hand, do not typically follow the directives in robots.txt and will still be able to display the content correctly.

It's important to note that while most well-behaved robots will respect the instructions in robots.txt, there may still be some rogue robots or malicious actors that ignore these instructions. Therefore, this solution provides an additional layer of protection but is not foolproof.

Regularly monitoring your server logs and working with your web hosting provider can help you stay updated on any unusual activities and ensure the security of your website.
Title: Re: How to block SE robots access to a website folder?
Post by: sophiaWindsor02 on Jul 27, 2023, 12:05 AM

You can use the robots.txt file to disallow bots from indexing your /css folder. However, to ensure total restriction, consider implementing server-side rules using the .htaccess file (for Apache servers) to block all bot access.
Title: Re: How to block SE robots access to a website folder?
Post by: rahul verma on Apr 06, 2024, 08:46 AM
In robots.txt, disable the directive. You can prevent search engines from accessing specific files, pages, or sections of your website. The disallow directive is used to do this.