Showing posts with label seo. Show all posts
Showing posts with label seo. Show all posts

Thursday, August 1, 2013

July Bots Are In

The July web crawler indexing bots stats are in. Here are the top bots for a small site I run.

Top 20 Obvious Bots


These bots are nice enough to include "bot", "spider", or "crawl" in their user agent string, or access the robots.txt file. Here are the top 20, representing 89% of obvious bot hits:

  1. 18% - Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)
  2. 13% - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  3. 12% - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
  4. 7% - Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
  5. 5% - Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)
  6. 5% - Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)
  7. 5% - Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)
  8. 4% - Twitterbot/1.0
  9. 3% - Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
  10. 3% - Mozilla/5.0 (compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/)
  11. 2% - ShowyouBot (http://showyou.com/crawler)
  12. 2% - Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)
  13. 2% - Aboundex/0.3 (http://www.aboundex.com/crawler/)
  14. 2% - Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
  15. 2% - Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)
  16. 2% - msnbot/2.0b (+http://search.msn.com/msnbot.htm)
  17. 1% - Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)
  18. 1% - Mozilla/5.0 (compatible; SearchmetricsBot; http://www.searchmetrics.com/en/searchmetrics-bot/)
  19. 1% - Mozilla/5.0 (compatible; Dow Jones Searchbot)
  20. 1% - Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

Top 20 Developer Packages or Proprietary Bots


These bots are built on developer packages, but don't specifically identify themselves as a bot. The top 20 represent 92% of hits from these bots.

  1. 19% - checks.panopta.com
  2. 16% - NING/1.0
  3. 13% - UnwindFetchor/1.0 (+http://www.gnip.com/)
  4. 10% - FeedBurner/1.0 (http://www.FeedBurner.com)
  5. 7% - JS-Kit URL Resolver, http://js-kit.com/
  6. 5% - UniversalFeedParser/5.0.1 +http://feedparser.org/
  7. 4% - PycURL/7.19.5
  8. 3% - Java/1.6.0_26
  9. 3% - TwitterFeed 3
  10. 2% - HTMLParser/2.0
  11. 2% - Ruby
  12. 2% - Mozilla/5.0 (Digg/1.0; support@digg.com)
  13. 1% - Java/1.7.0_21
  14. 1% - Crowsnest/0.5 (+http://www.crowsnest.tv/)
  15. 1% - curl/7.24.0
  16. 1% - Opera/7.11 (Windows NT 5.1; U) [en]
  17. 1% - MetaURI API/2.0 +metauri.com
  18. 1% - Jakarta Commons-HttpClient/3.1
  19. 1% - InAGist URL Resolver (http://inagist.com)
  20. 1% - Mozilla/5.0

Plus these other notables:

  1. Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; subscribers; feed-id=)
  2. Mozilla/5.0 (compatible; Embedly/0.2; +http://support.embed.ly/)

Top 20 Sneaky Bots


These bots either don't identify themselves, mask their identity using a common real user agent, or don't include a user agent. I identify these by hits from same or similar IP addresses, complete lack of any referring URLs, or too many hits from the same IP address.

  1. From IP 168.62.192.113 (Microsoft) with user agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.163 Safari/535.19".
  2. More coming soon


Saturday, August 11, 2012

More SEO and Social META tags for Twitter and Google

Here are some more HTML META tags I just discovered:

Twitter Cards

Twitter Cards (announced in June 2012) are extended tweets that can show more than just 140 characters. Here are META tags to add to your page HEAD section to help Twitter pick the right parts your site's content.

The twitter:card tag can be summary (for a news article, blog post, or text-based page), photo (for an image or picture), or player (for video).

<meta name="twitter:card" content="summary" />

The twitter:site and twitter:site:id tags let you identify your own Twitter username, and official Twitter user ID. Most people won't know their Twitter user ID. If you don't, you can omit that META tag.

<meta name="twitter:site" content="@YourTwiterScreenName">
<meta name="twitter:site:id" content="1234567890" />

The twitter:url tag is for a link to your page.

<meta name="twitter:url" content="http://www.YourSite.com/path/pagename.html" />

Other tags are pretty self-explanatory:

<meta name="twitter:title" content="Your Page Title" />

<meta name="twitter:description" content="Your page description." />

<meta name="twitter:image" content="http://www.YourSite.com/image.jpg" />

Google+ OpenGraph

OpenGraph (og) META tags help Google Plus pull the right information from your page, in case anyone ever shares it on Google+. These are pretty self-explanatory.

<meta property="og:title" content="Your Page Title" />

<meta property="og:image" content="http://www.YourSite.com/image.jpg" />

<meta property="og:description" content="Your page description." />

<meta property="og:site_name" content="Name of Your Site"/>

Google Search Thumbnails

Google also uses a thumbnail META tag to present small images next to its search results from your site.

<meta name="thumbnail" content="http://www.YourSite.com/image.jpg" />

Sunday, October 24, 2010

Helpful Web Page HTML Meta and Link Tags

Web pages can include extra HTML tags that your site visitors don't necessarily see, but that help search engines like Google and Bing, other web sites link Facebook, and applications like the iPhone browser know more about your page. This blog entry describes the most common and some other useful tags. All these tags, except for JavaScript, belong in the "head" HTML section of your web pages. Note that some of these are "meta" tags with "name" and "content" attributes, and some are "link" tags with "rel" and "href" attributes.

Basic Meta Tags

Title - The Title tag contains text that shows up at the top of the browser when someone visits your page. Search engines also use your title as the main link to your page when it shows up in search results. An example Title tag looks like this:

<title>This is where your page title goes</title>

Description - A description tag can contain more text about your site. Visitors don't see this information when visiting your page, but search engines generally show it just below the page title in search results. This is a good place to include additional keywords and a call to action. An example Description tag looks like this:

<meta name="description" content="Click on this site for more information on what you're searching for." />

Keywords - Though most search engines ignore contents of the Keywords tag, including it may be helpful if your site has its own search engine. Use it to include additional words not necessarily on your page that visitors might search by. Here's an example Keywords tag:

<meta name="keywords" content="web page editing authoring header meta tags HTML CSS JavaScript JS" />

Robots - This tag lets you tell search engines whether or not to index or cache your pages. If you don't want search engines to store a copy of your page, use a tag like this:

<meta name="robots" content="noarchive" />

Canonical - If your page can be reached at several different URLs, this tag lets you tell search engines which link you prefer them to use. See my other blog post for more information about this tag. Here's an example:

<link rel="canonical" href="http://www.yourdomain.com/yourpage.html" />

Including Site CSS and JavaScript Libraries

It's good practice for your site to use a common CSS stylesheet and JavaScript file that the browser can cache and reuse. Browsers will store these files so your visitor doesn't have to download them again with every page. Moving common formatting and scripts out of each page makes your page size smaller which means they'll load faster.

Stylesheet - This tag tells the browser where to find the general formatting for your site. Here's an example Stylesheet tag:

<link rel="stylesheet" type="text/css" href="/sitestylesheet.css" />

JavaScript - This tag tells the browser where to get your site's general script library containing functions that are reused on several pages. This tag belongs near the top of the "body" section of your web page. Here's an example JavaScript tag:

<script type="text/javascript" src="/sitejavascript.js"></script>

Browser Icons, and Apple iPhone and iPad Icons

These tags let you tell browsers and other applications which icon to use to represent your site when a visitor bookmarks your page.

Shortcut Icon - Web browsers will show a small icon, generally 16x16 pixels, in the address bar and bookmarks. To tell the browser where to find the icon, include the "shortcut icon" tag. Here's an example:

<link rel="shortcut icon" href="/favicon.ico" />

Apple Touch Icon - Apple iPhones and iPads will show a 57x57 icon when a visitor bookmarks your site. Here's how to tell Apple where to find your icon:

<link rel="apple-touch-icon" href="/logo57x57.png" />

Viewport - When the iPhone browser displays your page on its small screen, it can't always figure out just how wide or narrow to show it. Use the "viewport" tag to tell the iPhone Safari browser exactly how wide to display your page.

<meta name="viewport" content="width=650" />

Facebook

If you've ever shared a link on Facebook, you probably noticed that Facebook sometimes shows an irrelevant image from the page. To let Facebook know which image you prefer to use, include a "medium" and "image_src" tag.

Medium - This tag helps Facebook know how to display a shared link to your site. You can specify news, blog, image, video, audio, and mult for the "content" attribute based on your page's content. Here's an example "medium" tag for textual content.

<meta name="medium" content="news" />

Image_Src - This tag tells Facebook which image to use when someone shares a link to your page. If you would like to let the Facebook user select from more than one image, you can include this tag any number of times with different "href" image URLs. Here's an example:

<link rel="image_src" href="http://www.yourdomain.com/image1.png" />
<link rel="image_src" href="http://www.yourdomain.com/image2.png" />

RSS Feeds

Alternate - If your site has an RSS feed, most browsers will show an RSS icon near the address bar that visitors can click to subscribe to your RSS feed. To tell the browser where your RSS feed is, include a link to it like the one below. If your site has more than one RSS feed, you can include more than one "alternate" link.

<link rel="alternate" type="application/rss+xml" title="Your Feed Name" href="http://www.yourdomain.com/feed1.rss" />
<link rel="alternate" type="application/rss+xml" title="Your Feed Name" href="http://www.yourdomain.com/feed2.rss" />

Advanced Topic: Site Search

Modern browsers like Firefox or Microsoft Internet Explorer version 7 and higher let users add custom searches to search your site even when they're not on your site. If your site has its own search feature, you can set up a small XML file that tells the browser where to find your search.

First set up an XML file, generally named "opensearch.xml", like the example below and upload it to your server.

<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>Site or Search Name</ShortName>
<LongName>Your Site or Search Engine Name</LongName>
<Description>A longer description of your site or search engine.</Description>
<Image type="image/vnd.microsoft.icon" height="16" width="16">http://www.yoursite.com/favicon.ico</Image>
<Url type="application/opensearchdescription+xml" rel="self" template="http://www.yourdomain.com/opensearch.xml" />
<Url type="text/html" rel="results" template="http://www.yourdomain.com/yoursearchscript?yourquerytag={searchTerms}" />
<Query role="example" searchTerms="Example" />
</OpenSearchDescription>

If your search engine supports real-time search-as-you-type suggestions, add a line like this to your XML file:

<Url type="application/x-suggestions+json" template="http://www.yourdomain.com/yoursuggestionscript?yourquerytag={searchTerms}" />

Then include a "search" tag like the one below in the header of all your pages with a link to the search XML file your uploaded:

<link rel="search" type="application/opensearchdescription+xml" href="http://www.yourdomain.com/opensearch.xml" title="Your Site or Search Engine Name" />

Thursday, May 6, 2010

Make Search Engines Use Your Keywords with the Canonical Tag

It used to be that search engines would index keywords listed in web page meta keywords header tags like this:
<meta name="keywords" content="Schools Out Forever Maximum Ride" />

However, so may sites overloaded that tag with spam that search engines started ignoring it entirely. The challenge for site owners became where else to put keywords that search engines would still see. People noticed that Google not only looked for keywords in text, but also in domain names and URLs. So the trick became how to get keywords into your URLs.

Usually an URL includes a one-to-one mapping to a file name on the web server (or database-driven sites may use IDs in query strings). So webmasters could include keywords in file and directory names, but that gets tedious because generally anything between / characters is also a physical sub-directory, and it just doesn't work for database-driven sites. Using physical file and directory names would mean your web servers would have files in tons of individual sub-directories that would become impossible to maintain.

The good thing is there's no law that says an URL has to exactly equal a physical file name. So one solution is to set up your web server to rewrite URLs to come up with the real file name.

For example, all these URLs render the same content:
http://www.amazon.com/dp/0446618896
http://www.amazon.com/Schools-Out-Forever-Maximum-Ride/dp/0446618896
http://www.amazon.com/asdf-asdf-asdf-asdf-asdf/dp/0446618896

But how do you tell Google what your preferred URL is, since it could find any of those URLs? That's where the canonical tag comes in.

If you look at the source code for the pages at any of those URLs and find the canonical tag, you'll see that they all use the same value, no matter what the actual URL was:
<link rel="canonical" href="http://www.amazon.com/Schools-Out-Forever-Maximum-Ride/dp/0446618896" />

So Google should generally link to http://www.amazon.com/Schools-Out-Forever-Maximum-Ride/dp/0446618896 from its index, no matter what URL its spider really found the page at.

The trick Amazon does to make all those pages render the same thing probably utilizes web server URL-rewriting to ignore anything between "http://www.amazon.com/" and "/dp/0446618896" and simply serve whatever content is at location 0446618896 (or in their case, whatever's in the database with that ID). URL rewriting is an arcane topic, but should be familiar to system administrators who manages web servers.

Since Amazon can then include any keywords in their URLs, the other thing they probably do is ensure consistency in how they link each product. So no matter where they have their links (sitemaps, site search, product listing pages, etc.), they always use a single preferred canonical URL.

Thursday, August 21, 2008

Yahoo! Opens up Buzz

Yahoo's Buzz social bookmarking service was built as a competitor to Digg, and even Yahoo's own Del.icio.us social bookmarking services. For the first 6 months of its life, Buzz only worked with links from top publishers Yahoo! allowed into its system. Just this week, Yahoo! opened Buzz up to the whole internet. Now you can submit links from any web site.

If you want to add a Buzz link to your site, Yahoo! offers simple JavaScript badges you can embed on your page to let your visitors Buzz the page for you.

Though Yahoo! doesn't document how to build your own Buzz links, here's how it can be done using an URL. The basic syntax of the URL is:
http://buzz.yahoo.com/submit/?submitUrl=PAGEURL&submitHeadline=HEADLINE&submitSummary=DESCRIPTION&submitCategory=CATEGORY&submitAssetType=TYPE

Each of the values is described here:


  • submitUrl - The page URL your want to submit (with special characters escaped)


  • submitHeadline - The title of the page you're submitting (with spaces converted to + or %20)


  • submitSummary - The description of the page (with spaces converted to + or %20)


  • submitCategory - The category you want Yahoo! Buzz to list your page in. Possible categories are: business, entertainment, health, images, lifestyle, politics, science, sports, travel, usnews, video, or world_news


  • submitAssetType - The media type of your link. Possible types are: article (which they display as "Text"), image, or video.



Here's an example link to Buzz up this blog entry.

Thursday, May 8, 2008

Search Engine Optimization (SEO) Tips and Tricks

Here are some SEO (search engine optimization) tips on how to get your band or music site to show up higher in organic searches (i.e. non-paid search results):

1. Think Like a Searcher

What words will your visitors be searching on when you want your site to appear in the results?

Pick keywords (and their synonyms) appropriate for your target audience to use in the content of your site. Focus on medium-specific to very specific terms so there won't be so many other results competing with your site.

Don't use your keywords too many times on a page or a search engine may consider it keyword spam.

2. HTML Elements

Use your keywords in your page TITLE, meta descriptions, in H1 tags, in bold (or strong or em) font, and as hyperlinked text. Use different appropriate TITLE tags on different pages.

Use your keywords in your image file names, directory paths, ALT attributes and TITLE attributes. Use keywords in anchor TITLE attributes.

Most modern search engines ignore the meta keywords tag which is notorious for spam, but that doesn't mean you should not use the meta keywords tag.

If your site has versions of essentially the same page, but with different URLs (like sort order, or different skins), use link rel="canonical" tags in your page headers pointing to a single primary URL. Search engines will apply page rank "votes" from all links to the various versions of your page to the single canonical URL. That increases its page rank density, meaning your canonical page link will appear higher in search results.

Also add other useful link and meta tags various services, like Facebook and iPhone, may recognize.

3. Keywords in URLs

Use your keywords in your page and image URLs if possible (see example below, both go to the same place). Keywords may be in the actual file names on your server, or you could use URL rewriting to convert URLs with keywords into server file names.

http://www.amazon.com/Harry-Potter-Deathly-Hallows-Book/dp/0545010225
http://www.amazon.com/These-Words-Get-Indexed-Too/dp/0545010225

Google prefers keywords in URLs to be in lower case and separated by dashes (-), rather than spaces (which get rendered as "+" or "%20") or underscores (_).

4. Links to Your Site

Google gives your site extra credit for other sites that link to yours. Get other sites, blogs, and discussion forums to link to your site, preferably from hyperlinked keywords. So provide easy-to-copy deep-links or permalinks on each of your pages.

Syndicate your content (e.g. with RSS feeds) so links back to your site appear on other sites.

Provide common social bookmarking sharing links (e.g. Facebook, Twitter, Delicious.com, Digg, Buzz, etc.)

Find content-appropriate Wikipedia articles to link to your site as a reference.

Many search engines honor rel="nofollow" in link tags. If that attribute is present on a site's links, don't bother putting your links there - search engines may follow them to find your site, but they won't consider the link a "vote" for your page in their page rank.

5. On-Site Links

Hyperlink keywords on your own site to other pages on your site, or to your site search. This is why many sites now display "tag clouds."

6. Long Tail Text

Have lots and lots of pages with lots and lots of words. Not every page will get more than 1 or 2 hits ever, but in aggregate the search traffic adds up.

Use HTML rather than PDF or word processor files. Though some search engines will index document formats, some don't. Besides, proprietary document files are harder for visitors to access than HTML anyway.

7. Text Instead of Flash, Images, or AJAX

Render keywords on your site in text, not images or Flash. Search engines have no way of knowing what's in those files.

Search engines don't generally execute JavaScript, so any text available only from AJAX calls is not visible to spiders.

8. Traffic Reports

Monitor your traffic logs or reports. Your Home page may not actually be your site's front door. Make sure search entry pages are relevant to the search terms, and help users who first enter your site from a deeper page find other pages on your site.

Use your traffic reports to see what top search terms people use to find your page or site and update your pages to optimize for the keywords people are actually using to find your site - even if they're slightly different from what you first thought. If Google decides your site is the best place for keywords you didn't expect, optimize for those words. Ride the wave, don't fight it.

9. Spelling Variations

Include some common typos of your brand names and keywords so people who make the same typo in their search see your site in the results. (But see also tip #14.)

10. Make Dynamic URLs Look Static

Make site search look like regular pages with URL rewriting. Search engine spiders won't follow always dynamic or script directory links, or links with query string parameters, because they could result in an infinite number of pages, but if you make the spider think your dynamic page is really static, they'll index them.

For example, make a search URL that is really http://www.mysite.com/cgi-bin/search.cgi?term=lyrics look like this: http://www.mysite.com/search/lyrics. On Apache, you can do that with your .htaccess file:

RewriteEngine on
RewriteRule search/(.*) /cgi-bin/search.cgi?term=$1 [L]


The (.*) part means "take anything that appears here and put it in variable $1". The $1 part means "take whatever was in the first parentheses and put it here." The [L] means this is the last rewrite rule and the server can stop looking for other rewrite rules. Move the [L] to your last rewrite rule if you have more than one.

11. Avoid Frames

Don't use frames! Most search engines can't (or just won't) navigate to sub-frames. And if a searcher clicks through directly to one of your sub-frames, your site probably won't display properly.

12. Sitemaps

Implement a Google Sitemap.

13. Avoid Link Farms

Don't get your link on link-only pages or parked domains. Some search engines penalize sites that appear on these spam pages.

14. Don't be Devious

Use your keywords in real content. Don't put keywords in tiny fonts or in the same color as the background text. Some search engines penalize sites that use non-visible text.