Sunday, October 24, 2010

Helpful Web Page HTML Meta and Link Tags

Web pages can include extra HTML tags that your site visitors don't necessarily see, but that help search engines like Google and Bing, other web sites link Facebook, and applications like the iPhone browser know more about your page. This blog entry describes the most common and some other useful tags. All these tags, except for JavaScript, belong in the "head" HTML section of your web pages. Note that some of these are "meta" tags with "name" and "content" attributes, and some are "link" tags with "rel" and "href" attributes.

Basic Meta Tags

Title - The Title tag contains text that shows up at the top of the browser when someone visits your page. Search engines also use your title as the main link to your page when it shows up in search results. An example Title tag looks like this:

<title>This is where your page title goes</title>

Description - A description tag can contain more text about your site. Visitors don't see this information when visiting your page, but search engines generally show it just below the page title in search results. This is a good place to include additional keywords and a call to action. An example Description tag looks like this:

<meta name="description" content="Click on this site for more information on what you're searching for." />

Keywords - Though most search engines ignore contents of the Keywords tag, including it may be helpful if your site has its own search engine. Use it to include additional words not necessarily on your page that visitors might search by. Here's an example Keywords tag:

<meta name="keywords" content="web page editing authoring header meta tags HTML CSS JavaScript JS" />

Robots - This tag lets you tell search engines whether or not to index or cache your pages. If you don't want search engines to store a copy of your page, use a tag like this:

<meta name="robots" content="noarchive" />

Canonical - If your page can be reached at several different URLs, this tag lets you tell search engines which link you prefer them to use. See my other blog post for more information about this tag. Here's an example:

<link rel="canonical" href="http://www.yourdomain.com/yourpage.html" />

Including Site CSS and JavaScript Libraries

It's good practice for your site to use a common CSS stylesheet and JavaScript file that the browser can cache and reuse. Browsers will store these files so your visitor doesn't have to download them again with every page. Moving common formatting and scripts out of each page makes your page size smaller which means they'll load faster.

Stylesheet - This tag tells the browser where to find the general formatting for your site. Here's an example Stylesheet tag:

<link rel="stylesheet" type="text/css" href="/sitestylesheet.css" />

JavaScript - This tag tells the browser where to get your site's general script library containing functions that are reused on several pages. This tag belongs near the top of the "body" section of your web page. Here's an example JavaScript tag:

<script type="text/javascript" src="/sitejavascript.js"></script>

Browser Icons, and Apple iPhone and iPad Icons

These tags let you tell browsers and other applications which icon to use to represent your site when a visitor bookmarks your page.

Shortcut Icon - Web browsers will show a small icon, generally 16x16 pixels, in the address bar and bookmarks. To tell the browser where to find the icon, include the "shortcut icon" tag. Here's an example:

<link rel="shortcut icon" href="/favicon.ico" />

Apple Touch Icon - Apple iPhones and iPads will show a 57x57 icon when a visitor bookmarks your site. Here's how to tell Apple where to find your icon:

<link rel="apple-touch-icon" href="/logo57x57.png" />

Viewport - When the iPhone browser displays your page on its small screen, it can't always figure out just how wide or narrow to show it. Use the "viewport" tag to tell the iPhone Safari browser exactly how wide to display your page.

<meta name="viewport" content="width=650" />

Facebook

If you've ever shared a link on Facebook, you probably noticed that Facebook sometimes shows an irrelevant image from the page. To let Facebook know which image you prefer to use, include a "medium" and "image_src" tag.

Medium - This tag helps Facebook know how to display a shared link to your site. You can specify news, blog, image, video, audio, and mult for the "content" attribute based on your page's content. Here's an example "medium" tag for textual content.

<meta name="medium" content="news" />

Image_Src - This tag tells Facebook which image to use when someone shares a link to your page. If you would like to let the Facebook user select from more than one image, you can include this tag any number of times with different "href" image URLs. Here's an example:

<link rel="image_src" href="http://www.yourdomain.com/image1.png" />
<link rel="image_src" href="http://www.yourdomain.com/image2.png" />

RSS Feeds

Alternate - If your site has an RSS feed, most browsers will show an RSS icon near the address bar that visitors can click to subscribe to your RSS feed. To tell the browser where your RSS feed is, include a link to it like the one below. If your site has more than one RSS feed, you can include more than one "alternate" link.

<link rel="alternate" type="application/rss+xml" title="Your Feed Name" href="http://www.yourdomain.com/feed1.rss" />
<link rel="alternate" type="application/rss+xml" title="Your Feed Name" href="http://www.yourdomain.com/feed2.rss" />

Advanced Topic: Site Search

Modern browsers like Firefox or Microsoft Internet Explorer version 7 and higher let users add custom searches to search your site even when they're not on your site. If your site has its own search feature, you can set up a small XML file that tells the browser where to find your search.

First set up an XML file, generally named "opensearch.xml", like the example below and upload it to your server.

<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>Site or Search Name</ShortName>
<LongName>Your Site or Search Engine Name</LongName>
<Description>A longer description of your site or search engine.</Description>
<Image type="image/vnd.microsoft.icon" height="16" width="16">http://www.yoursite.com/favicon.ico</Image>
<Url type="application/opensearchdescription+xml" rel="self" template="http://www.yourdomain.com/opensearch.xml" />
<Url type="text/html" rel="results" template="http://www.yourdomain.com/yoursearchscript?yourquerytag={searchTerms}" />
<Query role="example" searchTerms="Example" />
</OpenSearchDescription>

If your search engine supports real-time search-as-you-type suggestions, add a line like this to your XML file:

<Url type="application/x-suggestions+json" template="http://www.yourdomain.com/yoursuggestionscript?yourquerytag={searchTerms}" />

Then include a "search" tag like the one below in the header of all your pages with a link to the search XML file your uploaded:

<link rel="search" type="application/opensearchdescription+xml" href="http://www.yourdomain.com/opensearch.xml" title="Your Site or Search Engine Name" />

Thursday, May 6, 2010

Make Search Engines Use Your Keywords with the Canonical Tag

It used to be that search engines would index keywords listed in web page meta keywords header tags like this:
<meta name="keywords" content="Schools Out Forever Maximum Ride" />

However, so may sites overloaded that tag with spam that search engines started ignoring it entirely. The challenge for site owners became where else to put keywords that search engines would still see. People noticed that Google not only looked for keywords in text, but also in domain names and URLs. So the trick became how to get keywords into your URLs.

Usually an URL includes a one-to-one mapping to a file name on the web server (or database-driven sites may use IDs in query strings). So webmasters could include keywords in file and directory names, but that gets tedious because generally anything between / characters is also a physical sub-directory, and it just doesn't work for database-driven sites. Using physical file and directory names would mean your web servers would have files in tons of individual sub-directories that would become impossible to maintain.

The good thing is there's no law that says an URL has to exactly equal a physical file name. So one solution is to set up your web server to rewrite URLs to come up with the real file name.

For example, all these URLs render the same content:
http://www.amazon.com/dp/0446618896
http://www.amazon.com/Schools-Out-Forever-Maximum-Ride/dp/0446618896
http://www.amazon.com/asdf-asdf-asdf-asdf-asdf/dp/0446618896

But how do you tell Google what your preferred URL is, since it could find any of those URLs? That's where the canonical tag comes in.

If you look at the source code for the pages at any of those URLs and find the canonical tag, you'll see that they all use the same value, no matter what the actual URL was:
<link rel="canonical" href="http://www.amazon.com/Schools-Out-Forever-Maximum-Ride/dp/0446618896" />

So Google should generally link to http://www.amazon.com/Schools-Out-Forever-Maximum-Ride/dp/0446618896 from its index, no matter what URL its spider really found the page at.

The trick Amazon does to make all those pages render the same thing probably utilizes web server URL-rewriting to ignore anything between "http://www.amazon.com/" and "/dp/0446618896" and simply serve whatever content is at location 0446618896 (or in their case, whatever's in the database with that ID). URL rewriting is an arcane topic, but should be familiar to system administrators who manages web servers.

Since Amazon can then include any keywords in their URLs, the other thing they probably do is ensure consistency in how they link each product. So no matter where they have their links (sitemaps, site search, product listing pages, etc.), they always use a single preferred canonical URL.