Tag Archive: Meta


About the Robots tag

from http://www.robotstxt.org/meta.html

About the Robots <META> tag

In a nutshell

You can use a special HTML <META> tag to tell robots not to index the content of a page, and/or not scan it for links to follow.

For example:

<html>
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>

There are two important considerations when using the robots <META> tag:

  • robots can ignore your <META> tag. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the NOFOLLOW directive only applies to links on this page. It’s entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.

Don’t confuse this NOFOLLOW with the rel="nofollow" link attribute.

The details

Like the /robots.txt, the robots META tag is a de-facto standard. It originated from a “birds of a feather” meeting at a 1996 distributed indexing workshop, and was described in meeting notes.

The META tag is also described in the HTML 4.01 specification, Appendix B.4.1.

The rest of this page gives an overview of how to use the robots <META> tags in your pages, with some simple recipes. To learn more see also the FAQ.

How to write a Robots Meta Tag

Where to put it

Like any <META> tag it should be placed in the HEAD section of an HTML page, as in the example above. You should put it in every page on your site, because a robot can encounter a deep link to any page on your site.

What to put into it

The “NAME” attribute must be “ROBOTS”.

Valid values for the “CONTENT” attribute are: “INDEX“, “NOINDEX“, “FOLLOW“, “NOFOLLOW“. Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is “INDEX,FOLLOW“, so there’s no need to spell that out. That leaves:

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Useful HTML Meta Tags

Tag Name Example(s) Description
Author <META NAME=”AUTHOR” CONTENT=”Tex Texin”> The author’s name.
cache-control <META HTTP-EQUIV=”CACHE-CONTROL” CONTENT=”NO-CACHE”> HTTP 1.1. Allowed values = PUBLIC | PRIVATE | NO-CACHE | NO-STORE.
Public – may be cached in public shared caches
Private – may only be cached in private cache
no-Cache – may not be cached
no-Store – may be cached but not archivedThe directive CACHE-CONTROL:NO-CACHE indicates cached information should not be used and instead requests should be forwarded to the origin server. This directive has the same semantics as the PRAGMA:NO-CACHE.
Clients SHOULD include both PRAGMA:NO-CACHE and CACHE-CONTROL:NO-CACHE when a no-cache request is sent to a server not known to be HTTP/1.1 compliant.
Also see EXPIRES.
Note: It may be better to specify cache commands in HTTP than in META statements, where they can influence more than the browser, but proxies and other intermediaries that may cache information.
Content-Language <META HTTP-EQUIV=”CONTENT-LANGUAGE”
CONTENT=”en-US,fr”>
Declares the primary natural language(s) of the document. May be used by search engines to categorize by language.
CONTENT-TYPE <META HTTP-EQUIV=”CONTENT-TYPE”
CONTENT=”text/html; charset=UTF-8″>
The HTTP content type may be extended to give the character set. It is recommended to always use this tag and to specify the charset.
Copyright <META NAME=”COPYRIGHT” CONTENT=”&copy; 2004 Tex Texin”> A copyright statement.
DESCRIPTION <META NAME=”DESCRIPTION”
CONTENT=”…summary of web page…”>
The text can be used when printing a summary of the document. The text should not contain any formatting information. Used by some search engines to describe your document. Particularly important if your document has very little text, is a frameset, or has extensive scripts at the top.
EXPIRES <META HTTP-EQUIV=”EXPIRES”
CONTENT=”Mon, 22 Jul 2002 11:12:01 GMT”>
The date and time after which the document should be considered expired. An illegal EXPIRES date, e.g. “0″, is interpreted as “now”. Setting EXPIRES to 0 may thus be used to force a modification check at each visit.
Web robots may delete expired documents from a search engine, or schedule a revisit.HTTP 1.1 (RFC 2068) specifies that all HTTP date/time stamps MUST be generated in Greenwich Mean Time (GMT) and in RFC 1123 format.
RFC 1123 format = wkday “,” SP date SP time SP “GMT”

wkday = (Mon, Tue, Wed, Thu, Fri, Sat, Sun)
date = 2DIGIT SP month SP 4DIGIT ; day month year (e.g., 02 Jun 1982)
time = 2DIGIT “:” 2DIGIT “:” 2DIGIT ; 00:00:00 – 23:59:59
month = (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec)

Keywords <META NAME=”KEYWORDS”
CONTENT=”sex, drugs, rock & roll”>
The keywords are used by some search engines to index your document in addition to words from the title and document body. Typically used for synonyms and alternates of title words. Consider adding frequent misspellings. e.g. heirarchy, hierarchy.
PRAGMA NO-CACHE <META HTTP-EQUIV=”PRAGMA” CONTENT=”NO-CACHE”> This directive indicates cached information should not be used and instead requests should be forwarded to the origin server. This directive has the same semantics as the CACHE-CONTROL:NO-CACHE directive and is provided for backwards compatibility with HTTP/1.0.
Clients SHOULD include both PRAGMA:NO-CACHE and CACHE-CONTROL:NO-CACHE when a no-cache request is sent to a server not known to be HTTP/1.1 compliant.
HTTP/1.1 clients SHOULD NOT send the PRAGMA request-header. HTTP/1.1 caches SHOULD treat “PRAGMA:NO-CACHE” as if the client had sent “CACHE-CONTROL:NO-CACHE”.
Also see EXPIRES.
Refresh <META HTTP-EQUIV=”REFRESH”
CONTENT=”15;URL=http://www.I18nGuy.com/index.html”>
Specifies a delay in seconds before the browser automatically reloads the document. Optionally, specifies an alternative URL to load, making this command useful for redirecting browsers to other pages.
ROBOTS <META NAME=”ROBOTS” CONTENT=”ALL”>

<META NAME=”ROBOTS” CONTENT=”INDEX,NOFOLLOW”>

<META NAME=”ROBOTS” CONTENT=”NOINDEX,FOLLOW”>

<META NAME=”ROBOTS” CONTENT=”NONE”>

CONTENT=”ALL | NONE | NOINDEX | INDEX| NOFOLLOW | FOLLOW | NOARCHIVE
default = empty = “ALL”
“NONE” = “NOINDEX, NOFOLLOW”
The CONTENT field is a comma separated list:
INDEX: search engine robots should include this page.
FOLLOW: robots should follow links from this page to other pages.
NOINDEX: links can be explored, although the page is not indexed.
NOFOLLOW: the page can be indexed, but no links are explored.
NONE: robots can ignore the page.
NOARCHIVE: Google uses this to prevent archiving of the page. See http://www.google.com/bot.html
GOOGLEBOT <META NAME=”GOOGLEBOT” CONTENT=”NOARCHIVE”> In addition to the ROBOTS META Command above, Google supports a GOOGLEBOT command. With it, you can tell Google that you do not want the page archived, but allow other search engines to do so. If you specify this command, Google will not save the page and the page will be unavailable via its cache.
See Google’s FAQ.
Powered by WordPress. Theme: Motion by 85ideas.