x32x01
ADMINISTRATOR
- by x32x01 ||
In this post, we answered five frequently asked questions about closing the site from search engines.
Search engine crawlers scan all data on the Internet. Nevertheless, website owners can limit or deny access to their resource. This requires closing the site from indexing via the robots.txt system file.
If you don’t need to close the site completely, block search indexing of individual pages. Users shouldn’t see the back office of the site, personal accounts, outdated information from the promotions or calendar section in their search. It is also necessary to close scripts, pop-up windows, banners, and heavy files from indexing. This will help reduce the indexing time and server load.
You can prohibit indexing of the site for all search engines, for a single bot, or choose to ban all except one.
Block indexing of
Indexing ban
Option 1.
Option 2.
Thus, you can disallow indexing of content, but allow links’ indexing. To do this, enter content= “noindex, follow”. The links on such a page will be indexed, and the text - will not. Use combinations of values for different cases.
If you decide to close the site from indexing using meta tags, there is no need to create robots.txt.
Syntactic - when rules are not written correctly in the file.
The most common ones include:
Cheat sheet
Use two options to ban the site indexing. Create a robots.txt file and specify a disallow directive for all crawlers. Or, write a ban in the robots meta tag (the index.html file inside the tag).
Close service information, out-of-date information, scripts, sessions, and utm-tags. Create a separate rule for each ban. Ban all search robots via * or specify the name of a specific crawler. If you want to allow only one robot to do that, write the rule through disallow. And don’t forget to check the indexing of links with an online google index checker.
Avoid logical and syntactic errors when creating a robots.txt file. Check the file using Yandex.Webmaster and Google Robots Testing Tool.
Occasionally check page indexing bans in mass using Linkbox. This is done in two steps: upload all the URLs and click on check links.
Search engine crawlers scan all data on the Internet. Nevertheless, website owners can limit or deny access to their resource. This requires closing the site from indexing via the robots.txt system file.
If you don’t need to close the site completely, block search indexing of individual pages. Users shouldn’t see the back office of the site, personal accounts, outdated information from the promotions or calendar section in their search. It is also necessary to close scripts, pop-up windows, banners, and heavy files from indexing. This will help reduce the indexing time and server load.
How to close the site completely?
The website is usually completely closed from indexing during the development or redesign. The websites where webmasters are learning or experimenting are also often closed.You can prohibit indexing of the site for all search engines, for a single bot, or choose to ban all except one.
Ban for all | User-agent: * Disallow: / |
Ban for an individual robot | User-agent:Googlebot-Image Disallow: / |
Ban for all but one robot | User-agent: * Disallow: / User-agent: Google Allow: / |
How to close in dividual pages?
Small business card websites don’t usually require hiding individual pages. For those websites with a lot of service information, close pages and whole sections:- administration panel;
- system directories;
- personal account;
- registration forms;
- order forms;
- product comparison;
- favorites;
- recycle bin;
- captcha;
- pop-ups and banners;
- site search;
- session IDs.
Block indexing of
a single page | User-agent: * Disallow: /contact.html |
a section | User-agent: * Disallow: /catalog/ |
The entire site, except for one section | User-agent: * Disallow: / Allow: /catalog |
The entire section, except for one subsection | User-agent: * Disallow: /product Allow: /product/auto |
Site search | User-agent: * Disallow: /search |
Administration panel | User-agent: * Disallow: /admin |
How to close other information?
With the robots.txt file, you can close folders, files, scripts, and utm-tags. You can hide them completely or selectively. Indicate a ban for indexing for all or individual robots.Indexing ban
File type | User-agent: * Disallow: /*.jpg |
Folders | User-agent: * Disallow: /images/ |
Folder, except for one file | User-agent: * Disallow: /images/ Allow: file.jpg |
Scripts | User-agent: * Disallow: /plugins/*.js |
utm-tags | User-agent: * Disallow: *utm= |
utm-tags for Yandex | Clean-Param: utm_source&utm_medium&utm_campaign |
How to close a site through meta tags?
A good alternative to the robots.txt file is the robots meta tag. Insert it in the site’s source code in the index.html file. Place it in the <head> container. Indicate those crawlers that cannot index the site. Enter “robots” if you close the site from all bot-crawlers. If you close the site from a specific crawler, enter its name.Option 1.
Code:
<meta name=”robots” content=”noindex, nofollow”/>
Option 2.
Code:
<meta name=”robots” content=”none”/>
The “content” attribute has the following values:
none - no indexing allowed, (including noindex and nofollow);
noindex - no indexing of content is allowed;
nofollow - no indexing of links is allowed;
follow - links indexing is allowed;
index - indexing is allowed;
all - content and links indexing is allowed.
Thus, you can disallow indexing of content, but allow links’ indexing. To do this, enter content= “noindex, follow”. The links on such a page will be indexed, and the text - will not. Use combinations of values for different cases.
If you decide to close the site from indexing using meta tags, there is no need to create robots.txt.
What kind of errors may occur?
Logical - when the rules contradict each other. Detect logical errors by checking the robots.txt file in the Google Robots Testing Tool.Syntactic - when rules are not written correctly in the file.
The most common ones include:
- non-case-sensitive writing;
- writing in capital letters;
- listing all the rules in one line;
- not having a blank line between the rules;
- specifying the crawler in the directive;
- listing a set instead of closing an entire section or folder;
- not having a mandatory disallow directive
Cheat sheet
Use two options to ban the site indexing. Create a robots.txt file and specify a disallow directive for all crawlers. Or, write a ban in the robots meta tag (the index.html file inside the tag).
Close service information, out-of-date information, scripts, sessions, and utm-tags. Create a separate rule for each ban. Ban all search robots via * or specify the name of a specific crawler. If you want to allow only one robot to do that, write the rule through disallow. And don’t forget to check the indexing of links with an online google index checker.
Avoid logical and syntactic errors when creating a robots.txt file. Check the file using Yandex.Webmaster and Google Robots Testing Tool.
Occasionally check page indexing bans in mass using Linkbox. This is done in two steps: upload all the URLs and click on check links.