Meta tags in HTML

Meta tags

Meta tags is one of the tags you put in the header of the page. On occasion you can read that meta tags aren't necessary for the page, but that is a misunderstanding. You can make a page with content, readable on a screen, without meta tags. That is something entirely different.

You have three general types of meta tags. There is the type that looks like ordinary tags, with a start and end tag, and there is the HTTP-EQUIV META tags and META NAME tags. The basic architecture of the latter two are:

<META NAME=" " CONTENT=" ">
<META HTTP-EQUIV =" " CONTENT=" ">

Both tags have two attributes, that have to have a value. NAME and HTTP-EQUIV defines the type, e.g. language tag, keyword tag, author tag etc., and CONTENT is the value.

The difference between the two types is that HTTP-EQUIV are the tags pertaining the way the browser reads the page, when opening the page, e.g. language, character set, etc., while NAME are the tags the spiders use for indexing, e.g. keywords, author, publisher, etc. In practice, it doesn't work quite like that, but almost.

Mandatory tags

The most important meta tag is one of the tags that looks like ordinary tags, only placed in the header sections. That is the TITLE tag:

<TITLE>Insert the title here!</TITLE>

The TITLE tag is the piece of text being shown as the link on the search engines' pages, i.e. the first thing from the web site the reader is shown, and the part that has to convince the reader to go to the page. The search engines usually show up to around 65 characters, so it is a good idea to keep the size below this. Because it is the first thing being presented from the page, the search engines consider the content of the TITLE tag of great importance, so this is one of the places to give content and expression some serious consideration.

The next meta tag is really two meta tags, and they are real meta tags. The two fields are called "description" and "subject" and they do exactly the same thing. It is only a matter of some search engines in the old days used one tag while other search engines used the other tag. The syntax is:

<META NAME="description" CONTENT="A description of the page content">
<META NAME="subject" CONTENT="A description of the page content">

Search engines consider this/these tags to be quite significant. Today it is, for all intents and purposes, only "description" which is used for indexing, and it is only if you really want to cover all bases, that you use both tags. In some cases, it is the content of these tags being shown on the search engines, along with the TITLE tag, instead of parts of the text on the page.

Recommended tags

A couple tags that aren't mandatory, but just an extremely good idea to use, are language, charset and robots.

Language
Language is where you tell which language is used on the page. In regards to how search engines react to your site, this is important. The Danish Google favour sites in Danish, and if your site is in another language, Google offers to translate the page. This is why you get different results depending on the use of google.dk, google.de or google.com for searching. Helping search engines and the visitors is always a good idea, so even though it isn't mandatory to tell which language is used, it is a very good idea to have in order.

The tag for language is different, depending the use of HTML 4.01 ot 5.

For HTML 4.01 you can choose between two versions:

<META NAME="language" CONTENT="english">
<META HTTP-EQUIV="content-language" CONTENT="en">

For HTML 5 language has been changes to an attribute for the HTML tag:

<HTML LANG="en">

Because the browsers become increasingly sophisticated and available in other languages than English, the http-equiv version is the best version for HTML 4.01, as it tells both the search engine which language is used and upon opening the pages, the language on the functions (e.g. buttons) isn't changed, though it normally follow the language of the browser/operating system.

The language codes for the various languages can be found here.

Charset
In connection with the language, there is also the character set (charset) you can use. Here you can do some actual damage, if you don't pay attention, because the way the browsers handle the charset depends on the file format. Starting with the file format:

Depending on the editor you use, you can choose between one of the following file formats:

ANSI
UTF-8
UTF-8 without BOM
UCS-2 Big Endian
UCS-2 Little Endian

Where you find these options is also dependent on the editor. If you use something like Notepad for Windows it is under "Save as", you choose the file format. In an editor like Notepad++, you find it under "Encoding" in the menu bar. For ordinaly web sites, you use ANSI or UTF-8. Here things get a little tricky, because the file format UTF-8 don't work with a charset like windows-1252 and the file format ANSI don't work with the charset UTF-8. Do note that UTF-8 is both a charset and a file format.

Some years ago, when the browsers weren't very sophisticated, it was always a good idea to define the charset to be used on the page. It was especially special characters like the Danish æ, ø and å the browsers couldn't figure how to display and therefore inserted other characters (often a black square or a question mark). To avoid this, you could specify the charset, e.g. windows-1252 which was the one that was good for North European characters. Here you should be aware that there is a difference between HTML 4.01 and HTML 5:

For HTML 4.01 it looks like this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=windows-1252">

For HTML 5 it looks like this:
<META CHARSET="windows-1252">

The Americans like the charset named UTF-8, and several places you can read that with that one, you are always covered. That is not the case, unless you also use the file format UTF-8! Some places you can read that you are supposed to use ISO-8859-1, to be covered in regards to the North European characters. This is both true and false. The page will show the right characters, so it works, but browsers today, for some reason, have a build-in routine that what you really mean is the charset windows-1252, so instead they use that charset when they read the code. Since that is the case, you might as well use the opportunity to make the page load faster and choose windows-1252... and this is where you have to remember to use the file format ANSI, because otherwise the browser can't show characters like æ, ø and å the right way.

In practice, the easiest, and by far the most effective, solution is to use UTF-8 as both file format and charset.

Robots
The third meta tag which is a good idea to use is "robots". The spiders indexing the pages for the search engines, follow the links on the pages. If you have a page you don't want indexed, or the spider shouldn't follow the links on the page, you can use:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Obviously you can just use one of them, NOINDEX or NOFOLLOW.

You want to use these tags on pages pointing to other sites e.g. on pages with relevant links and pages that can be spammed, e.g. guest books, blogs and fora. The reason you want to use them there, is that if you link to pages that are considered dishonest in some way, or don't exist, you get punished by the pages not being shown in search results.

What NOFOLLOW does is that the spiders don't follow the link, whereby you avoid problems if something damaging appears at the other end of the link. The other reason is spammers, putting in links to their sites. NOFOLLOW helps obstruct their destructive behaviour.

NOINDEX is, as the name indicates, an instruction that the page isn't indexed, i.e. you can't find the pages whe searching. The use is a double-edged sword, as it prevents spam from being indexed, thus obstructing the damaging behaviour of the spammers. On the other hand, it means that the parts you want people to find, won't appear in searches either. In guest books, NOINDEX is beneficial, but for fora and blogs, it is one of the things that should be considered, and decided whether you want one or the other effect.

NOINDEX and NOFOLLOW can be switched on and off according to need, so if you decide that you chose wrong, you can change it without any problems.

Redundant tags

As the web has developed since it became commonly used in 1995, there are meta tags that used to be necessary, or at least a good idea, but not longer have any relevance. Others seemed to be a good idea, but never was.

Note: Several of the redundant tags are available as the tags called micro tagging, and here they are both relevant and effective.

Back in the early days of the web, keywords was one of the most important fields to fill out. Because it was abused extensively for cheating by writing a lot of irrelevant keywords to lure in visitors, this field haven't been uded much since around 2000. Some of the minor search engines are said to use keywords, but the way the world looks in 2016, it makes no difference having the keywords. The syntax, if someone should feel like including them, is:

<META NAME="keywords" CONTENT="knick-knacks, gizmos, gadgets, doohickies">

In principle you don't need to use comma to separate the keywords, but it makes it easier to work with.

Authors and publishers are two of the tags that would appear to be a good idea, but never has been:

<META NAME="author" CONTENT="Navnet på forfatteren">
<META NAME="publisher" CONTENT="Navnet på udgiver">

The option is there, but the search engines don't use them for anything and don't index the content. If you want it to be something you can search for, write it on the page itself, i.e. in the BODY tag.

Date and time are yet another two things that intuitively sound like a good idea to have in your meta tags. They are not. If it is something you actually want to use for anything, it goes into the BODY tag, and if it is for keeping track of updates of the page content, then this can be read in the file attributes, along with file size, etc. Should you want to use the tag, the syntax is:

<META NAME="date" CONTENT="1994-11-06T08:49:37+00:00">

Advanced tags

After a little training in web design, you start creating and using what is called Cascading Style Sheets, or CSS, and external JavaScripts. CSS has to be uploaded along with the page, and JavaScrips can be uploaded along with the page. The syntax for the style sheet stylesheet.css and the script JavaScripts.js is:

<LINK REL="stylesheet" HREF="stylesheet.css" TYPE="text/css">
<SCRIPT TYPE="text/javascript" SRC="JavaScripts.js"></SCRIPT>

These are for trained web designers and will be explained in the sections about CSS and JavaScript.