ALL THE INFORMATION ABOUT SEO NEWS WITH SEO SERVICES.

Monday, June 23, 2008

Webmaster Tools

Webmaster Tools

Our suite of webmaster tools provides you with a free and easy way to make your site more Google-friendly. They can show you Google’s view of your site, help you diagnose problems, and let you share info with us to help improve your site’s visibility.

Getting Google’s view of your site, and diagnosing potential problems
The first step to increasing your site’s visibility on Google is learning how our robots crawl and index your site.

  • Crawl info: You can make sure we have access to your site, and see when Googlebot last visited. You can also view URLs that we’ve had trouble crawling and why we couldn't crawl them. This way, you can fix any problems preventing us from indexing all of your pages.
  • Robots.txt file validation: See if we’re having trouble with your file, and test out changes to that file before you change it on your server.
  • Website content: View top content from your site and see the words that other sites use to link to it.

Seeing how your site performs
A second step is learning what drives traffic to your site.

  • Top queries: Find the top queries that drive traffic to your site and where your site is included in the top search results. This will let you learn how users are finding your site.
  • Indexing information: See how your site is indexed and which of your pages are included in the index. If we find violations in your site, we’ll give you the opportunity to fix the problems and request reinclusion of your site.

Sharing info with Google about your site
Since no one knows more about your site than you do, you can also share this info with Google and improve your crawlability.

  • Submit a Sitemap file: Tell us all about your pages by submitting a Sitemap file; help us learn which pages are most important to you and how often those pages change.
  • Specify your preferred domain: Tell us which URL to use when indexing your site; we’ll do our best to index the version you prefer.

What is a Sitemap file and why should I have one?

In general, there are two types of sitemaps. The first type of sitemap is a HTML page listing the pages of your site - often by section - and is meant to help users find the information they need.

XML Sitemaps - usually called Sitemaps, with a capital S - are a way for you to give Google information about your site. This is the type of Sitemap we'll be discussing in this article.

In its simplest terms, a Sitemap is a list of the pages on your website. Creating and submitting a Sitemap helps make sure that Google knows about all the pages on your site, including URLs that may not be discoverable by Google's normal crawling process.

Sitemaps are particularly helpful if:

  • Your site has dynamic content.
  • Your site has pages that aren't easily discovered by Googlebot during the crawl process - for example, pages featuring rich AJAX or Flash.
  • Your site is new and has few links to it. (Googlebot crawls the web by following links from one page to another, so if your site isn't well linked, it may be hard for us to discover it.)
  • Your site has a large archive of content pages that are not well linked to each other, or are not linked at all.

You can also use a Sitemap to provide Google with additional information about your pages, including:

  • How often the pages on your site change. For example, you might update your product page daily, but update your About Me page only once every few months.
  • The date each page was last modified.
  • The relative importance of pages on your site. For example, your home page might have a relative importance of 1.0, category pages have an importance of 0.8, and individual blog entries or product pages have an importance of 0.5. This priority only indicates the importance of a particular URL relative to other URLs on your site, and doesn't impact the ranking of your pages in search results.

Sitemaps provide additional information about your site to Google, complementing our normal methods of crawling the web. We expect they will help us crawl more of your site and in a more timely fashion, but we can't guarantee that URLs from your Sitemap will be added to the Google index. Sites are never penalized for submitting Sitemaps.

Google adheres to Sitemap Protocol 0.9 as defined by sitemaps.org. The Sitemap Protocol is a dialect of XML for summarizing Sitemap information that is relevant to web crawlers. Sitemaps created for Google using Sitemap Protocol 0.9 are therefore compatible with other search engines that adopt the standards of sitemaps.org.

While a standard Sitemap works for most sites, you can also create and submit specialized Sitemaps for certain types of content. These Sitemap formats are specific to Google and are not used by other search engines. They're a good way to give Google detailed information about specific content types. For example, publishers can use News Sitemaps to give Google information that can appear in Google News search results, such as publication date, keywords, and stock ticker symbol. Sitemap formats include:

What are Sitemaps?

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

Monday, June 16, 2008

Google Sitemap Generator

sitemap

What is "XML sitemap"?


By placing a formatted xml file with site map on your webserver, you enable Search Engine crawlers (like Google) to find out what pages are present and which have recently changed, and to crawl your site accordingly.


What is "Change frequency"?

This value indicates how frequently the content at a particular URL is likely to change.


What is "Last Modified"?

The time the URL was last modified. This information allows crawlers to avoid recrawling documents that haven't changed.
You can let the generator set this field from your server's response headers or to specify your own date and time.

What is "Priority"?

The priority of a particular URL relative to other pages on the same site. The value for this tag is a number between 0.0 and 1.0, where 0.0 identifies the lowest priority page(s) on your site and 1.0 identifies the highest priority page(s) on your site.
The default priority of a page is 0.5.

Sitemap generation

Here are 4 simple steps to get it done

1. Enter your full website URL and some optional parameters in the form below.

2. Press 'Start' button and wait until the site is completely crawled (the progress will be indicated)

3. You will see the generated sitemap details page, including number of pages, broken links list, XML file content and link to a compressed sitemap. Download the sitemap file using this link and put it into the "public_html/" folder of your site.

4. Go to your Google Webmaster account and add your sitemap URL.
Please check About Sitemaps for more details.



Starting URL

Please enter the full http address for your site, only the links within the starting directory will be included. For instance, "domain.com" and "www.domain.com" are not the same.


Change frequency

Last modification



Priority

Check your settings and click button below

Maximum 500 pages will be indexed in sitemap

Need to index more? Check our Standalone version of Google sitemap generator with unlimited number of pages for crawler.


Where to get your RSS Feed Icons …

RSS Feed IconNeed a standard feed icon or a feed icon PSD so you can manipulate how the icon looks?

Mozilla Foundation has made RSS feed icons available to help feed links become more recognizable and standardized. Their guidelines and FAQ provide full information on the icon and how it should be used.

Though the icons are available on the Mozilla site via the links above, the best place to get the icons is feedicons.com. At feedicons.com there is a zip file with the icons in a number of different file formats, including Photoshop and Illustrator formats.

Where to submit your RSS feeds …

Have a blog or other site that outputs an RSS feed? Want more exposure for your site or feed? Masternewmedia has compiled and regularly updates a list of websites where RSS feeds can be submitted. Each link has been tested and there is information on the website and how it works, as well as a direct link to the feed submission page.

List of RSS Submission Sites

There is also a feed of these sites which updates as the list is updated, so you can submit to new sites as they are added …

RSS Feed Feed of RSS Submission Sites

Massive List of RSS Resources

Mashable has put together a huge list of RSS Tools which includes:

  • Windows RSS Readers
  • OS X RSS Readers
  • Linux RSS Readers
  • Web-based RSS Readers
  • Mobile RSS Readers
  • RSS-to-email tools
  • Feed Validators
  • RSS-related Firefox plugins
  • RSS plugins for Wordpress
  • RSS Managers
  • Tools for combining or mixing RSS feeds
  • RSS ping tools
  • RSS feed directories

WhatIsRSS.com now has a blog ...

We have always wanted to keep this resource brief and to the point, but we realise there is alot more that can be communicated about using RSS. Our RSS Blog was launched 26 July 2007 to extend and complement the information provided here. If you are interested in learning more about RSS go there now and subscribe! It will be updated over time with information on using RSS and will feature tools to help you use RSS in new and better ways.

What do I need to do to read an RSS Feed? RSS Feed Readers and News Aggregators

Feed Reader or News Aggregator software allow you to grab the RSS feeds from various sites and display them for you to read and use.

A variety of RSS Readers are available for different platforms. Some popular feed readers include Amphetadesk (Windows, Linux, Mac), FeedReader (Windows), and NewsGator (Windows - integrates with Outlook). There are also a number of web-based feed readers available. My Yahoo, Bloglines, and Google Reader are popular web-based feed readers.

Once you have your Feed Reader, it is a matter of finding sites that syndicate content and adding their RSS feed to the list of feeds your Feed Reader checks. Many sites display a small icon with the acronyms RSS, XML, or RDF to let you know a feed is available.

RSS Primer: One Page Quick Introduction to RSS

What is RSS?

RSS (Rich Site Summary) is a format for delivering regularly changing web content. Many news-related sites, weblogs and other online publishers syndicate their content as an RSS Feed to whoever wants it.

Why RSS? Benefits and Reasons for using RSS

RSS solves a problem for people who regularly use the web. It allows you to easily stay informed by retrieving the latest content from the sites you are interested in. You save time by not needing to visit each site individually. You ensure your privacy, by not needing to join each site's email newsletter. The number of sites offering RSS feeds is growing rapidly and includes big names like Yahoo News.

Friday, June 13, 2008

Robots.txt Checker

Robots.txt files (often erroneously called robot.txt, in singular) are created by webmasters to mark (disallow) files and directories of a web site that search engine spiders (and other types of robots) should not access.

This robots.txt checker is a "validator" that analyzes the syntax of a robots.txt file to see if its format is valid as established by Robot Exclusion Standard (please read the documentation and the tutorial to learn the basics) or if it contains errors.

1 Simple usage: How to check your robots.txt file format? Just insert the full URL (Example: http://www.yourdomain.com/robots.txt) of the robots.txt file you want to analyze and hit Enter
2 Powerful: The checker finds syntax errors, "logic" errors, mistyped words and it gives you useful optimization tips
3 Accurate: The validation process takes in account both Robots Exclusion Standard rules and spider-specific (Google, Inktomi, etc.) extensions (including the new "Sitemap" command).

This robots.txt analyzer is provided by Motoricerca, a non-profit italian guide to web site optimization and search engine positioning.

Robots.txt full URL (Example: http://www.domain.com/robots.txt):



Robots Tester FAQ

Enter The Content Of The robots.txt To Check:

You can paste the contents of your robots.txt file here to test it without putting the file on your server:


Enter The URL Of The robots.txt To Check:

The URL Must End In '.txt' for this script to work

If the robots.txt tester helped you, please say 'thank you' with a link.




robots.txt Checker

Test The Syntax Of Your Robots File

This tester allows you to check your robots.txt file for syntax errors. There are 2 methods for putting your robots.txt contents into this script:

Method 1: If you already have a robots.txt file on your server, enter the URL of that file and this script will retrieve the content and test it.

Method 2: You can paste the contents of you robots.txt file into the text box below, this is an ideal way to test modifications before putting them on your server.

If you don't have a robots.txt file yet use our simple robots.txt creator to make one.

Using a robots.txt File

This is a useful file that keeps search engines from indexing pages you do not want spidered. Why would you not want a page indexed by a search engine? Perhaps you want to display a page that shows an example of spamming the search engines. This type of page might include an example of repeated keywords, hidden tags with keywords, and other things that could get a page or an entire site banned from a search engine.

The robots.txt file is a good way to prevent this page from getting indexed. However, not every site can use it. The only robots.txt file that the spiders will read is the one at the top html directory of your server. This means you can only use it if you run your own domain. The spiders will look for the file in a location similar to these below:

http://www.pageresource.com/robots.txt
http://www.javascriptcity.com/robots.txt
http://www.mysite.com/robots.txt

Any other location of the robots.txt file will not be read by a search engine spider, so the file locations below will not be worthwhile:

http://www.pageresource.com/html/robots.txt
http://members.someplace.com/you/robots.txt
http://someisp.net/~you/robots.txt

Now, if you have your own domain- you can see where to place the file. So let's take a look at exactly what needs to go into the robots.txt file to make the spider see what you want done.

If you want to exclude all the search engine spiders from your entire domain, you would write just the following into the robots.txt file:

User-agent: *
Disallow: /

If you want to exclude all the spiders from a certain directory within your site, you would write the following:

User-agent: *
Disallow: /aboutme/

If you want to do this for multiple directories, you add on more Disallow lines:

User-agent: *
Disallow: /aboutme/
Disallow: /stats/

If you want to exclude certain files, then type in the rest of the path to the files you want to exclude:

User-agent: *
Disallow: /aboutme/album.html
Disallow: /stats/refer.htm

If you are curious, here is what I used to keep an article from getting indexed:

User-agent: *
Disallow: /zine/article002.htm

If you want to keep a specific search engine spider from indexing your site, do this:

User-agent: Robot_Name
Disallow: /

You'll need to know the name of the search engine spider or robot, and place it where Robot_Name is above. You can find these names from the web sites of the various search engines.

So, if you need to exclude something from search engine indexing, this is the most effective tool recognized by the search engines- so use it to keep the spiders out of any part of your web you want them to avoid.

Why is a Robots.txt File Important?

What is the purpose of a robots.txt file?

  1. It Can Avoid Wastage of Server Resources

    At the date of this writing, as far as I know, many of the search engine spiders do not bother to index the scripts on your site (such as your CGI or PHP scripts). However, there are those that do, including one of the major players, Google.

    For robots or spiders that actually index scripts, they will actually call your scripts just as a browser would, complete with all the special characters. If your site is like mine, where the scripts are solely meant for the use of humans and serve no practical use for a search engine (why should a search engine need to invoke my site-navigation script? - it can just crawl the direct links), you may want to block spiders from the directories that contain your scripts. For example, I block spiders from my CGI-BIN directory. Hopefully, this will reduce the load on the web server that occurs when scripts are executed by removing unnecessary executions.

    Of course there are the occasional ill-behaved robots that hit your server at high speed. Such spiders can actually bring down your server or at the very least slow it down for the real users who are trying to access it. If you know of any such spiders, you might want to exclude them too. You can do this with a robots.txt file. Unfortunately though, ill-behaved spiders often ignore robots.txt files as well.

  2. It Can Save Your Bandwidth

    If you look at your website's web logs, you will undoubtedly find many requests for the robots.txt file by various search engine spiders. If, like me, you have a customized 404 document (which loads each time a visitor tries to retrieve a page that does not exist on your site), you will find that the robot will wind up requesting for that document instead, if you don't have an existing robots.txt file. My site has a fairly large 404 document, with the result that the spiders wind up loading it repeatedly throughout the day, adding to my already large bandwidth problems. In such a case, having a small robots.txt file may save you some bandwidth (yeah, I know, it's not that much).

    Some spiders may also request for files which you feel they should not. For example, one search engine requests for graphic files (".gif" files") on my sites. Since I see little reason why I should let it index the graphics on my site, waste my bandwidth, and possibly infringe my copyright, I ban it (and in fact all spiders) from my graphic files directory in my robots.txt file.

  3. It Removes Clutter from your Web Statistics

    I don't know about you, but one of the things I check from my web statistics is the list of URLs that visitors tried to access, but met with a 404 File Not Found Error. Often this tells me if I made a spelling error in one of the internal links on one of my sites (yes, I know - I should have checked all links in the first place, but mistakes do happen).

    If you don't have a robots.txt file, you can be sure that /robots.txt is going to feature in your web statistics 404 report, adding clutter and perhaps unnecessarily distracting your attention from the real bad URLs that need your attention.

  4. Refusing a Robot

    Sometimes you don't want a particular spider to index your site for some reason or other. Perhaps the robot is ill-behaved and spiders your site at such a high speed that it takes down your entire server. Or perhaps you prefer that you don't want the images on your site indexed in an image search engine. With a robots.txt file, you can exclude certain spiders from indexing your site with a robots.txt directive, provided the spider obeys the rules in that file.

How to Set Up a Robots.txt File

Writing a robots.txt file could not be easier. It's just an ASCII text file that you place at the root of your domain. For example, if your domain is www.yourdomain.com, you will place the file at www.yourdomain.com/robots.txt.

The file basically lists the names of spiders on one line, followed by the list of directories or files it is not allowed to access on subsequent lines, with each directory or file on a separate line. It is possible to use the wildcard character "*" instead of naming specific spiders. When you do so, all spiders are assumed to be named. Note that the robots.txt file is a robots exclusion file (with emphasis on the "exclusion") - there is no way to tell spiders to include any file or directory.

Take the following robots.txt file for example:

User-agent: *
Disallow: /cgi-bin/

The above two lines, when inserted into a robots.txt file, inform all robots (since the wildcard asterisk "*" character was used) that they are not allowed to access anything in the cgi-bin directory and its descendents. That is, they are not allowed to access cgi-bin/whatever.cgi or even a file or script in a subdirectory of cgi-bin, such as /cgi-bin/anything/whichever.cgi.

If you have a particular robot in mind, such as the Google image search robot, which collects images on your site for the Google Image search engine, you may include lines like the following:

User-agent: Googlebot-Image
Disallow: /

This means that the Google image search robot, "Googlebot-Image", should not try to access any file in the root directory "/" and all its subdirectories. This effectively means that it is banned from the entire of your website.

You can have multiple Disallow lines for each user agent (ie, for each spider). Here is an example of a longer robots.txt file:

User-agent: *
Disallow: /images/
Disallow: /cgi-bin/

User-agent: Googlebot-Image
Disallow: /

The first block of text disallows all spiders from the images directory and the cgi-bin directory. The second block of code disallows the psbot spider from every directory.

It is possible to exclude a spider from indexing a particular file. For example, if you don't want Google's image search robot to index a particular picture, say, mymugshot.jpg, you can add the following:

User-agent: Googlebot-Image
Disallow: /images/mymugshot.jpg

Remember to add the trailing slash ("/") if you are indicating a directory. If you simply add

User-agent: *
Disallow: /privatedata

the robots will be disallowed from accessing privatedata.html as well as privatedataandstuff.html as well as the directory tree beginning from /privatedata/ (and so on). In other words, there is an implied wildcard character following whatever you list in the Disallow line.

Where Do You Get the Name of the Robots?

If you have a particular spider in mind which you want to block, you have to find out its name. To do this, the best way is to check out the website of the search engine. Respectable engines will usually have a page somewhere that gives you details on how you can prevent their spiders from accessing certain files or directories.

Common Mistakes in Robots.txt

Here are some mistakes commonly made by those new to writing robots.txt rules.

  1. It's Not Guaranteed to Work

    As mentioned earlier, although the robots.txt format is listed in a document called "A Standard for Robots Exclusion", not all spiders and robots actually bother to heed it. Listing something in your robots.txt is no guarantee that it will be excluded. If you really need to protect something, you should use a .htaccess file to password-protect the directory (if you are running your site on an Apache server).

  2. Don't List Your Secret Directories

    Anyone can access your robots file, not just robots. For example, typing http://www.google.com/robots.txt will get you Google's own robots.txt file. I notice that some new webmasters seem to think that they can list their secret directories in their robots.txt file to prevent that directory from being accessed. Far from it. Listing a directory in a robots.txt file often attracts attention to the directory. In fact, some spiders (like certain spammers' email harvesting robots) make it a point to check the robots.txt for excluded directories to spider.

  3. Only One Directory/File per Disallow line

    Don't try to be smart and put multiple directories on your Disallow line. This will probably not work the way you think, since the Robots Exclusion Standard only provides for one directory per Disallow statement.

It's Worth It

Even if you want all your directories to be accessed by spiders, a simple robots file with the following may be useful:

User-agent: *
Disallow:

With no file or directory listed in the Disallow line, you're implying that every directory on your site may be accessed. At the very least, this file will save you a few bytes of bandwidth each time a spider visits your site (or more if your 404 file is large); and it will also remove Robots.txt from your web statistics bad referral links report.

Copyright 2001-2008 by Christopher Heng. All rights reserved.
Get more free tips and articles like this, on web design, promotion, revenue and scripting, from http://www.thesitewizard.com/

If you find this article useful, please consider making a donation.

thesitewizard™ News Feed (RSS Site Feed) Subscribe to thesitewizard.com newsfeed

Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at http://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.


Do Not Reprint Without Permission

This article is copyrighted. Please do not reproduce this article in whole or part, in any form, without obtaining my written permission.

Related Pages

New Pages

Popular Pages

How to Link to This Page

It will appear on your page as:

How to Set Up a robots.txt to Control Search Engine Spiders

How to Set Up a robots.txt to Control Search Engine Spiders

http://www.thesitewizard.com/archive/robotstxt.shtml
by Christopher Heng, thesitewizard.com

When I first started writing my first website, I did not really think that I would ever have any reason why I would want to create a robots.txt file. After all, did I not want search engine robots to spider and thus index every document in my site? Yet today, all my sites, including thesitewizard.com, have a robots.txt file in their root directory. This article explains why you might also want to include a Robots.txt file on your sites, how you can do so, and notes some common mistakes made by new webmasters with regards the ROBOTS.TXT file.

For those new to the robots.txt file, it is merely a text file implementing what is known as the Standard for Robot Exclusion. The file is placed in the main directory of a website that advises spiders and other robots which directories or files they should not access. The file is purely advisory - not all spiders bother to read it let alone heed it. However, most, if not all, the spiders sent by the major search engines to index your site will read it and take cognizance of the rules contained within the file.

Robots Text File - robots.txt

The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. For those spiders that obey the file, it provides a map for what they can, and cannot index. The file must reside in the root directory of your web. The URL path (web address) of your robots.txt file should look like this...

/robots.txt

The Robots text file open in Notepad might look like this:

screen shot robots.txt file This is a screen shot of an empty robots.txt file Example of robots.txt File
Screen Shot - Robots Text File

Definition of the above robots.txt file:

User-agent: *
The asterisk (*) or wildcard represents a special value and means any robot.

Disallow:
The Disallow: line without a / (forward slash) tells the robots that they can index the entire site.

Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record without the / (forward slash) as shown above.

The presence of an empty "/robots.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all robots will consider themselves welcome.

The Disallow: line without the trailing slash (/) tells all robots to index everything. If you have a line that looks like this:

Disallow: /private/

It tells the robot that it cannot index the contents of that /private/ directory.

Summarizing the Robots Exclusion Protocol - robots.txt file

To allow all robots complete access:

User-agent: *
Disallow:

screen shot robots.txt fileThis is a screen shot of an empty robots.txt file Example of robots.txt File
Screen Shot - Robots Text File

To exclude all robots from the server:

User-agent: *
Disallow: /

To exclude all robots from parts of a server:

User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

To exclude a single robot from the server:

User-agent: Named Bot
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any Robot" and therefore is the only one needed until you fully understand how to set up different User-agents.

If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:

Disallow: /private/top-secret-stuff.htm

Keep in mind that using the above example excludes that specified page (top-secret-stuff.htm) but will not exclude the entire /private/ directory.

You should validate your robots.txt file. Enter the full URI to the robots.txt file on your server. The robots.txt file always resides at the root level of your web.

Full URI:

Here are a few good online references for information on the Robots Exclusion Protocol.
  1. WebmasterWorld - robots.txt Validation
  2. WebmasterWorld - robots.txt Forum
  3. The Web Robots Pages - Robots Exclusion Protocol
  4. Three Easy Ways to Reduce robots.txt Code Bloat
  5. The Web Robots Pages - Database of Web Robots

A New Concept in Marketing - The Bot Blog

WebmasterWorldBrett Tabke - WebmasterWorld - The Bot Blog
Robots.txt - Where no blog has gone before. Brett may have started a new marketing medium, blogging via your robots.txt file. Only a true geek would appreciate The Bot Blog!

Thursday, June 12, 2008

On-Page Factors

On-Page Factors

As Search Engine Optimisation experts, Rupiz Media deals in perfecting on-page factors that influence the rankings of your web pages. Catering to a range of services in this regard, we provide viable solutions to web businesses of all kinds.

As part of on-page optimisation, we take care of the following aspects of a website:

Keyword Research
We undertake extensive research of the web to enlist the most important keywords and phrases for your website. These keywords eventually form the base of your search engine marketing campaign.

Content Review & Optimisation
We review the content of your site and suggest changes wherever need be. These changes include judicious sprinkling of your web pages with essential keywords, so as to foster the process of search engine indexing. Along with that, we also ensure that the content is written in simple, marketing-oriented language that can attract visitors.

Competitor Analysis
Rupiz Media studies the online marketplace for you. We scan the websites of your competitors in order to analyse the reasons for their success or failure in various optimisation techniques. This helps us formulate strategies that are advantageous for your business.

Meta Tags Optimisation
A part of your site's HTML programming, meta tags are distinctive to each and every web page. We optimise the Title, Description, Keyword, and other meta tags in a way that they comprise your main keywords and assist in faster indexing of your pages.

URL Optimisation
We optimise the URLs of your web pages in order to make them more conducive for indexing. This process also involves the use of pertinent keywords that can attract search engine spiders.

Image Optimisation
In case your website has images, we optimise their Alt Tags and package them in a way that they do not impede the process of optimisation.

Site Navigation
When it comes to structuring and internal linking of your website, we give due consideration to search engine friendliness to render an eye-catching navigational constitution to your pages.

More On page facters

On-Page factors are related directly to the content and structure of the website. This normally consists of pages written in the HyperText Markup Language but also applies to other document formats that are indexed by search engines, for example Microsoft Word or PDF formats. On-page optimization involves modifying keyword frequency in the URL, Title, Headings, Hypertext Links and Body text. It may also involve reducing redundant HTML codes (aka cruft) produced by Web page authoring tools and restructuring the site to produce better linked and focussed page content.

Many search engines now discount the weight given to on-page factors because they give too much scope for abuse by SEO experts. In theory the visible parts of a web-page are less prone to manipulation as they have to make sense to readers. However doorway pages with redirections and clever use of style sheets enable different content to be served to search engines and end users.

Each page should target between two and four keywords directly related to the contents. If you feel the need for more keywords then consider splitting your content into separate pages. The Uniform Resource Locator (URL) should contain keywords, separated by hyphens without being too long, around 128 characters is probably a sensible upper limit for the entire URL. The Title tag should contain the keywords with no stop words but arranged to make sense.

On Page Optimization

This should be the first tag in the Head section of the page. There is evidence that search engines give more weight to factors higher up the page. The content should be properly structured with the use of Heading (H1, H2, H3 etc) tags containing relevant keywords. Search-engines will only index a limited amount of text in HTML tags and using too many keywords will dilute the focus. Don't spam any of these tags, this won't be effective and could result in a penalty.

Many website designers spend a lot of time creating Keyword and Description meta tags. Although these may be read by search engines, for example the description tag is used by Yahoo! to provide a short description of the site in the Search Engine Results Pages, they are not used for ranking pages.

Personally I don't bother with them as they bulk out pages for little real benefit. Both Google, Yahoo! and MSN Search will use the text they find on the page as a description so make sure your first header and sentence describe the contents. However some search engine watchers say that the new Microsoft search engine, currently in beta tests, puts some weight on meta-tags. There is also evidence to suggest that search engines give more prominence earlier in the page and some engines will only index a limited amount of body text so making the first paragraph punchy is a good idea.

Image alternate-text tags (ALT tags) are only indexed where the image is part of a hyperlink. However ALT tags are useful for non-graphical browsing and should be employed correctly.

Description of Image

Comments are not indexed. Use bold/strong/italic attributes where appropriate.

Write natural copy aimed at the end user and not search engines. Don't worry too much about keyword density for the contents but take the opportunity to include keywords combined in different phrases and orders and create anchor text to related internal pages. Keep the number of links to fewer than 50, and probably less and don't repeat identical outbound-links. Theme related pages should be at the same level in the site hierarchy and be linked through the site's menu structure and site map. At least one page at the same level should link back to the home page so that search engines that have traversed a deep-link can index the rest of the website.

For any other document format, e.g. PowerPoint, Adobe PDF etc make sure you at least have a descriptive document title. Try to avoid formats that search engines find hard to understand, even where a search engine can index a format it will carry less information than plain old HTML. Avoid using images to replace text, except occasionally in hyperlinks. Avoid formats such as flash, shockwave and sitemaps where there is no alternative text. Avoid HTML Frames which some search engines find hard to navigate, use Style Sheets (CSS) instead. Style Sheets should also be used to reduce the amount of formatting within documents. Keep pages to less than 100 kilobytes and preferably not much more than a screen full of text. Where Javascript or Flash menus are used include plain-text links at the bottom of the page. These will ensure all search engines index the rest of your website.

Other factors directly under the control of the website is the amount of content. Large websites generally rank better than small websites for a number of reason. Search engines also like fresh content and will spider this more frequently. A regularly updated news page, even a blog, can provide deep links to the rest of the website.

"ON PAGE FACTORS" ---

1. Alleged POSITIVE ON-Page SEO Google Ranking Factors (38)
(Keeping in mind the converse, of course, that when violated, some of these factors
immediately jump into the
NEGATIVE On-Page Ranking Factors domain.)

The term "Keyword" below refers to the "Keyword Phrase", which can be one word or more.
Green rows confirmed by Google patent - updated 08-10-06
Note -
Patent
Claim
#
Factor
#
POSITIVE
ON-Page SEO Factors

Brief Note

50


KEYWORDS

Google patent - Topic extraction
For keyword selection,
try Overture - Google Ad Words - Google Trends

HOT

1

Keyword in URL

First word is best, second is second best, etc.

HOT

2

Keyword in Domain name

Same as in page-name-with-hyphens



Keywords - Header


HOT

3

Keyword in Title tag

Keyword in Title tag - close to beginning
Title tag 10 - 60 characters, no special characters.

-

4

Keyword in Description meta tag

Shows theme - less than 200 chars.
Google no longer "relies" upon this tag, but will often use it.

-

5

Keyword in Keyword metatag

Shows theme - less than 10 words.
Every word in this tag MUST appear somewhere in the body text. If not, it can be penalized for irrelevance.
No single word should appear more than twice.
If not, it may be considered spam. Google purportedly no longer uses this tag, but others do.



Keywords - Body


-

6

Keyword density in body text

5 - 20% - (all keywords/ total words)
Some report topic sensitivity - the keyword spamming threshold % varies with the topic.

-

7

Individual keyword density

1 - 6% - (each keyword/ total words)

HOT

8

Keyword in H1, H2 and H3

Use Hx font style tags appropriately

-

9

Keyword font size

"Strong is treated the same as bold, italic is treated the same as emphasis" . . . Matt Cutts July 2006

-

10

Keyword proximity (for 2+ keywords)

Directly adjacent is best

-

11

Keyword phrase order

Does word order in the page match word order in the query?
Try to anticipate query, and match word order.

-

12

Keyword prominence (how early in page/tag)

Can be important at top of page, in bold, in large font



Keywords - Other


-

13

Keyword in alt text

Should describe graphic - Do NOT fill with spam
(Was part of Google Florida OOP - tripped a threshold - may still be in effect to some degree as a red flag, when summed with all other on-page optimization - total page optimization score - TPOS).

-

14

Keyword in links to site pages (anchor text)

Links out anchor text use keyword?



NAVIGATION - INTERNAL LINKS


SITE

15

To internal pages- keywords?

Link should contain keywords.
The filename "linked to" should contain the keywords.
Use hyphenated filenames, but not long ones - two or three hyphens only.

SITE

16

All Internal links valid?

Validate all links to all pages on site.
Use a free link checker. I like this one.

SITE

17

Efficient - tree-like structure

TRY FOR two clicks to any page - no page deeper than 4 clicks

SITE

18

Intra-site linking

Appropriate links between lower-level pages

54


NAVIGATION - OUTGOING LINKS


55

19

To external pages- keywords?

Google patent - Link only to good sites. Do not link to link farms. CAREFUL - Links can and do go bad, resulting in site demotion. Unfortunately, you must devote the time necessary to police your outgoing links - they are your responsibility.

56

20

Outgoing link Anchor Text

Google patent - Should be on topic, descriptive

61, 62

21

Link stability over time

Google patent - Avoid "Link Churn"

-

22

All External links valid?

Validate all links periodically.

-

23

Less than 100 links out total

Google says limit to 100,
but readily accepts 2-3 times that number. ref 2k



OTHER ON-Page Factors


-

24

Domain Name Extension
Top Level Domain - TLD

.gov sites seem to be the highest status
.edu sites seem to be given a high status
.org sites seem to be given a high status
.com sites excel in encompassing all the spam/ crud sites, resulting in the need for the highest scrutiny/ action by Google.
Perhaps one would do well with the new .info domain class. - Nope. Spammers jumped all over it - no safe haven there. Not so much, now - .info sites can rank highly.

-

25

File Size

Try not to exceed 100K page size (however, some subject matter, such as this page, requires larger file sizes).
Smaller files are preferred <40k>


26

Hyphens in URL

Preferred method for indicating a space, where there can be no actual space
One or two= excellent for separating keywords (i.e., pet-smart, pets-mart)
Four or more= BAD, starts to look spammy
Ten = Spammer for sure, demotion probable?

6, 7
12, 13

27

Freshness of Pages

Google patent - Changes over time
Newer the better - if news, retail or auction!
Google likes fresh pages. So do I.

8, 9

28

Freshness - Amount of Content Change

New pages - Ratio of old pages to new pages

27

29

Freshness of Links

Google patent - May be good or bad
Excellent for high-trust sites
May not be so good for newer, low-trust sites

-

30

Frequency of Updates

Frequent updates = frequent spidering = newer cache

-

31

Page Theming

Page exhibit theme? General consistency?

-

32

Keyword stemming

Stem, stems, stemmed, stemmer,
stemming, stemmist, stemification

-

33

Applied Semantics

Synonyms, CIRCA white paper

-

34

LSI

Latent Semantic Indexing - Speculation, no proof

-

35

URL length

Keep it minimized - use somewhat less than the 2,000 characters allowed by IE - less than 100 is good, less is even better



OTHER ON-SITE Factors


5

36

Site Size - Google likes big sites

Larger sites are presumed to be better funded, better organized, better constructed, and therefore better sites. Google likes LARGE sites, for various reasons, not all positive. This has resulted in the advent of machine-generated 10,000-page spam sites - size for the sake of size. Google has caught on and dumped millions of pages, or made them supplemental.

4

37

Site Age

Google patent - Old is best. Old is Golden.

3

38

Age of page vs. age of site

Age of page vs. age of other pages on site
Newer pages on an older site will get faster recognition.


Note: For ALL the POSITIVE On-Page factors listed above,
PAGE RANK can
OVERRIDE them all. So can Google-Bombing.

Top of page

2. Alleged Negative ON-Page SEO Google Ranking Factors (24)
Note
Factor
#
NEGATIVE
ON-Page SEO Factors

Brief Note

BAD
39

Text presented in graphics form only
No ACTUAL body text on the page

Text represented graphically is invisible to search engines.

BAD
40

Affiliate site?

The Florida update went after affiliates with a vengeance - flower and travel affiliates were hit hard - cookie-cutter sites with massive inter-linking, but little unique content. Subsequent updates have also targeted affiliates.

BAD
41

Over optimization penalty (OOP)

Penalty for over-compliance with well-established, accepted web optimization practices. Too high keyword repetition (keyword stuffing) may get you the OOP. Overuse of H1 tags has been mentioned. Meta-tag stuffing.

BAD
42

Link to a bad neighborhood

Don't link to link farms, FFAs (Free For All's)
Also, don't forget to check the Google status of EVERYONE you link to periodically. A site may go "bad", and you can end up being penalized, even though you did nothing. For instance, some failed real estate sites have been switched to p0rn by unscrupulous webmasters, for the traffic. This is not good for you, if you are linking to the originally legitimate URL.

BAD
43

Redirect thru refresh metatags

Don't immediately send your visitor to another page other than the one he/ she clicked on, using meta refresh.

BAD
44

Vile language - ethnic slur

Including the George Carlin 7 bad words you can't say on TV, plus the 150 or so that followed. Don't shoot yourself right straight in the foot. Also, avoid combinations of normal words, which when used together, become something else entirely - such as the word juice, and the word l0ve. See why I wrote that zero? I don't even want to get a proximity penalty, either. Paranoia, or caution? You decide. I always want to try to put my "best foot forward".

BAD
45

Poison words

The word "Links" in a title tag has been suggested to be a bad idea. Here is my list of Poison Words for Adsense. This penalty has been loosened - many of these words now appear in normal context, with no problems. But watch your step.

BAD
46

Excessive cross-linking

- within the same C block (IP=xxx.xxx.CCC.xxx)
If you have many sites (>10, author's guess) with the same web host, prolific cross-linking can indicate more of a single entity, and less of democratic web voting. Easy to spot, easy to penalize.
"This does not apply to a small number of sites" .. (this author guesses the number 10, JAWG) . . . "hosted on a local server". . Matt Cutts July 2006

BAD
47

Stealing images/ text blocks from another domain

Copyright violation - Google responds strongly
if you are reported. ref egol
File Google DMCA

BAD
48

Keyword stuffing threshold

In body, meta tags, alt text, etc. = demotion

??
49

Keyword dilution

Targeting too many unrelated keywords on a page, which would detract from theming, and reduce the importance of your REALLY important keywords.

??

50

Page edit - can reduce consistency

Google patent -
Google is now switching between a "newer" cache, and several "older" caches, frequently drawing from BOTH at the same time.
This was possibly implemented to frustrate SERP manipulators. Did your last edit substantially alter your keywords, or theme? Expect noticeable SERP bouncing.

6 - 7

51

Frequency of Content Change

Google patent - Too frequent = bad

32, 33

52

Freshness of Anchor Text

Google patent - Too frequent = bad

??
53

Dynamic Pages

Problematic - know pitfalls - shorten URLs, reduce variables (". . no more than 2 or 3", M.Cutts July 2006), lose the session IDs

??
54

Excessive Javascript

Don't use for redirects, or hiding links

??
55

Flash page - NOT

Most (all-?) SE spiders can't read Flash content
Provide an HTML alternative, or experience lower SERP positioning.

??
56

Use of Frames

Spidering Problems with Frames - STILL

-

57

Robot exclusion "no index" tag

Intentional self-exclusion

-

58

Single pixel links

A red flag - one reason only - a sneaky link.

-

59

Invisible text

OK - No penalty - Google advises against this.
All over the place - but nothing is ever done. (The text is the same color as the background, and hence cannot be seen by the viewer, but can be visible to the search engine spiders.) I believe Google does penalize for hidden text, since it is an attempt to manipulate rank. Although they don't catch everyone.

-

60

Gateway, doorway page

(I see changes here - not only does the doorway page disappear, but the main page gets pushed down, as well - this is a welcome fix.)

OK - No penalty - Google advises against this.
Google used to reward these pages.
Multiple entrance pages in the top ten SERPs - I see it daily. There they are at #2, with their twin at #5 - 6 months now. Reported numerous times.

-

61

Duplicate content (YOUR'S)
Duplicate content (THEIR'S) below (Highjack)

OK - No penalty - Google advises against this.
Google picks one (usually the oldest), and shoves it to the top, and pushes the second choice down. This has been a big issue with stolen content - the thief usurps your former position with YOUR OWN content.

-

62

HTML code violations
(The big G does not even use DOCTYPE declarations, required for W3C validation.)

Doesn't matter - Google advises against this.
Unless of course, the page is totally FUBAR.
Simple HTML verification is NOT required (but advised, since it could contribute to your page quality factor - PQF).

-


Since the above 4 items are so controversial, I would like to add this comment:
There are many things that Google would LIKE to have webmasters do, but that they simply cannot control, due to logistical considerations. Their only alternative is to foment fear and doubt by implying that any violation of their "suggestions" will result in swift and fierce demotion.
(This is somewhat dated - G is fixing these things.)

IN GENERAL, this works pretty well to keep webmasters in line. The fallacy of this is that attentive webmasters can readily observe continuing, blatant exceptions to these official pronouncements.

There are many anecdotes about Goggle "taking care" of a problem. Google states that they do not provide hand-tweaked "boosts", but are silent about hand-tweaked demotions. They occur, for sure. To believe otherwise is naive. Wouldn't YOU swat the most obnoxious flies? I would.

It is becoming easier to determine the best thing to do. Try to avoid any Google penalties or demotions.

-

119

Phrase-based ranking, filters, penalties

Feb. 2007 - Google patent granted. Do not use phrases that have been associated and correlated with known spamming techniques, or you will be penalized. What phrases? Ahh, you tell me.

Top of page

3. Alleged POSITIVE OFF-Page SEO Google Ranking Factors (43)
Note
Factor
#
POSITIVE
OFF-Page SEO Factors

Brief Note



INCOMING LINKS :


HOT

63

Page Rank

Based on the Number and Quality of links to you
Google link reporting continues to display just a SMALL fraction of your actual backlinks, and they are NOT just greater than PR4 - they are mixed.

-

64

Total incoming links ("backlinks")

Historically, FAST counted best (www.alltheweb.com).
No more - Yahoo (parent) broke it.

In Yahoo search, type in:
linksite:www.domain-name.com
linkdomain:www.domainname.com

Try MSN -
http://beta.search.msn.com
Use link:www.domainname.com

Current TYPICAL Backlink Reporting Ratios -
Google - 30 links
MSN - 1,000 links
Yahoo - 3,000 links

-

65

Incoming links from high-ranking pages

In 2004, Google used to count (report) the links from all PR4+ pages that linked to you. In 2005-2006, Google reported only a small fraction of the links, in what seemed like an almost random manner. In Feb. 2007, Google markedly upgraded (increased) the number of links that they report.

-

66

Acceleration of link popularity
(". . . used to be a good thing" ... Martha)

Google patent
Link acquisition speed boost - speculative
Too fast = artificial? Cause of -30 penalty?
Sandbox penalty imposed if new site?



FOR EACH INCOMING LINK :


-

67

Page rank of the referring page

Based on the quality of links to you

HOT

68

Anchor text of
inbound link to you

Contains keyword, key phrase?
#1 result in SERP does NOT EVEN need to have the keyword(s) on the page, ANYWHERE!!! What does that tell you? (Enables Google-bombing - search for "miserable failure")


69

Age of link

Google patent - Old = Good.


70

Frequency of change of anchor text

Google patent - Not good. Why would you do that?


71

Popularity of referring page

Popularity = desirability, respect

-

72

# of outgoing links on referrer page

Fewer is better - makes yours more important

-

73

Position of link on referrer page

Early in HTML is best

-

74

Keyword density on referring page

For search keyword(s)

-

75

HTML title of referrer page

Same subject/ theme?

28

76

Link from "Expert" site?

Google patent - Big time boost (Hilltop Algorithm)
Recently reported to give a big boost !

-

77

Referrer page - Same theme

From the same or related theme? BETTER

-

78

Referrer page - Different theme

From different or unrelated theme? WORSE

-

79

Image map link?

Problematic?

-

80

Javascript link?

Problematic- attempt to hide link?



DIRECTORIES :


HOT
81

Site listed in DMOZ Directory?

The "Secret Hand" DMOZ Issues
1. Legitimate sites CAN'T GET IN
2. No Accountability
3. Corrupt Editors
4. Competitive Sites Barred
5. Dirty Tricks Employed
6. Rude dmoz editors

Flawed concept - communism doesn't work
Free editing? Nothing is free.
DMOZ Sucks Discussions
DMOZ Problems Discussions

The Google Directory is produced by an unknown, ungoverned, unpoliced, ill-intentioned, retaliatory, monopoly enterprise, consisting of profiteering power-ego editors feathering their own nests - the ODP. AOL is making millions, and needs to police it's run-amok entity. Enough already!

This is a tough one.
Google's directory comes STRAIGHT from the DMOZ directory. You should try to get into dmoz.
But you can't.
Be careful whom you approach with the old spondulix -
Formal DMOZ Bribe Instructions.
It is almost impossible to get into DMOZ. This site cannot get in, after waiting over 2 YEARS (33 months). Not even in the lowest, most insignificant category, "Personal Pages". I guess I just don't "measure up" to the other 20,000+ sites in the personal category.
I'm not the suck-up type - I kissed them off long ago. What a waste of time!

UPDATE: This page (not site) finally got indexed in June 2007, thanks to a legitimate editor. No money was paid.

Google needs to DO SOMETHING about populating its own directory with the skewed, incomplete, poorly determined results from the dysfunctional Open Directory Project - the ODP!
Absolute Power Corrupts Absolutely

-

82

DMOZ category?

Theme fit category?
General or geographic category? Both are possible, and acceptable.

HOT
83

Site listed in Yahoo Directory?

Big boost - You can get in by paying $299 each year.
Many swear it is worth it - many swear it isn't.

-

84

Site listed in LookSmart Directory?

Boost? Another great vote for your site.


85

Site listed in inktomi?

Inktomi has been absorbed internally by Yahoo.

-

86

Site listed in other directories (About, etc.)

Directory listing boost (If other RESPECTED directories link to you, this must be positive.)

-

87

Expert site? (Hilltop or Condensed Hilltop)

Large-sized site, quality incoming links

HOT

88

Site Age - Old shows stability

Google patent
Boost for long-established sites, new pages indexed easily
The opposite of the sand box.

-

89

Site Age - Very New Boost

Temporary boost for very new sites - I estimate that this boost lasts from 1 week to 3 weeks - Yahoo does it too.

-

90

Site Directory - Tree Structure

Influences SERPs - logical, consistent, conventional

-

91

Site Map and more site map

Complete - keywords in anchor text

-

92

Site Size

Previously, many pages preferred - conferred authority upon site, thus page. Bigger sites = better SERPs
Now, fewer pages preferred, due to proliferation of computer-generated pages. Google has been dropping pages like crazy.

-

93

Site Theming

Site exhibit theme? Use many related terms?
Have you used a keyword suggestion tool?
A thesaurus?



PAGE METRICS - USER BEHAVIOR:

Currently implemented through the Google tool bar?

34, 35

94

Page traffic

Google patent - # of visitors, trend

15,16,21

95

Page Selection Rate - CTR

Google patent - How often is a page clicked on?

36, 37

96

Time spent on page

Google patent - Relatively long time = indicates relevance hit

45, 46

97

Did user Bookmark page?

Google patent - Bookmark = Good

47

98

Bookmark add/ removal frequency

Google patent - Recent = Good?


99

How they left, where they went

Back button, link clicked, etc.



SITE METRICS - USER BEHAVIOR :

Currently implemented through the Google tool bar?

34, 35

100

Site Traffic

Google patent - # of visitors, increasing trend = good


101

Referrer

Authoritative referrer?


102

Keyword

Keyword searches used to find you

-

103

Time spent on domain

Relatively long time = indicates relevance hit
Add brownie points.

38


DOMAIN OWNER BEHAVIOR :


40

104

Domain Registration Time

Google patent - Domain Expiration Date
Register for 5 years, Google knows you are serious.
Register for 1 year, is it a throw-away domain?

39

105

Are associated sites legitimate?

Google patent - No spam, ownership, etc.

Top of page

4. Alleged NEGATIVE OFF-Page SEO Google Ranking Factors (13)
Note
Factor
#
NEGATIVE
OFF-Page SEO Factors
Brief Note

-

120
(added)

Traffic Buying

Have you paid a company for web traffic? It is probably low quality traffic, with a zero conversion rate. Some providers of traffic for traffic's sake may be considered "bad neighborhoods". Can Google discount your traffic (for true popularity), because they know it's mostly phony?
Have you read about Traffic Power?

22-29

106

Temporal Link Analysis

In a nut shell, old links are valued, new links are not.
This is intended to thwart rapid incoming link accumulation, accomplished through the tactic of link buying.
Just one of the sandbox factors.

18

107

Change of Meanings

Query meaning changes over time, due to current events

BAD
108

Zero links to you

You MUST have at least 1 (one) incoming link (back link) from some website somewhere, that Google is aware of, to REMAIN in the index.

BAD

109

Link-buying

(Very good IF you don't get caught,
but don't do it -
when caught, the penalty isn't worth it.)

Google patent - Google hates link-buying, because it corrupts their PR model in the worst way possible.
1. Does your page have links it really doesn't merit?
2. Did you get tons of links in a short time period?
3. Do you have links from high-PR, unrelated sites?

41, 42

110

Prior Site Ranking

Google patent - High = Good

BAD
111

Cloaking

Google promises to Ban! (Presenting one webpage to the search engine spider, and another webpage to everybody else.)

??
112

Links from bad neighborhoods, affiliates

Google says that incoming links from bad sites can't hurt you, because you can't control them. Ideally, this would be true.
However, some speculate otherwise, esp., when other associated factors are thrown into the mix, such as web rings.

BAD
113

Penalties - resulting from
Domain Hijacking
(work with Google to fix)

Should result in IMPRISONMENT, forthwith!
Grand Theft, mandatory minimum sentence.
The criminal COPIES your entire website, and HOSTS it elsewhere, with . . . a few changes.

-

114

Penalty - Google TOS violation

WMG is the worst offender - gobbles up tons of Google server time by nervous Nellie webmasters. Google even mentions them by name. I think that Google will spank you when you cross the threshold, of say, 100 queries per day for the same term, from the same IP. Google can block your IP. Get a Google API.

??
115

Server Reliability - S/B >99.9%

What is your uptime? Ever notice a daily time when your server is unavailable, like about 1:30 AM? How diligent must Googlebot be? This is the worst reason to get dropped - you just aren't there! An ISP maintenance interruption can cause delisting..

-

116

No more room
Pages being dropped from large sites

The 232 problem - Google has hit the 4.3 Gigabyte address space wall. Bull! Google now has over 8 Gigs of indexed pages.
Thousands of pages are disappearing from various huge websites, but I think that it is G just cleaning house, by dumping computer-generated pages.

HOT

117

Rank Manipulation by
Competitor Attack

(1. Content theft causing you to get a duplicate content penalty, even though your content is the original - Google has problems tracking original authorship. People are still stealing my content, but nobody trumps me (in Google) with my own content - hats off to Google.)

Examples -
Site-Wide Link Attack
and
302 Redirect Attack
and
Hijacker Attack

Impossible by Google definition (except for a few nasty tricks, like making your competition appear to be link spammers)
Ideally, there SHOULD be nothing that your competition can do to directly hurt your rankings.

However, an astute observer noticed that Google changed their website to read :
Old verbiage = "There is nothing a competitor can do to harm your ranking ..."
New verbiage = "There is ALMOST nothing a competitor can do ..."
An obvious concession that Google thinks that at least some dirty tricks work!

Of course, there will always be new ones!

-

118

Bouncing Ball Algorithm






At least 2, and often 3 identifiable Google Search Algos are currently in use, alternating pseudo-randomly through the data centers.
G has moved to a daily dance. Multiple changing factors are applied daily. GOOD LUCK NOW on trying to figure things out!

IN ADDITION, some the above factors are being "tweaked" daily. Not only are the "weights" of the factors changed, but the formula itself changes. Change is the only constant.

An algo change can boost or demote your site. I put this in the negative factors section, because your position is never secure, unless of course, you are huge (PR=7 or greater). If you simply cannot achieve top position, your only alternative to first page SERP exposure may be Google Ad Words (you pay for exposure).

Today, I searched for an extremely competitive "2-word term", and I found that NOT ONE of the top ten Google SERPs had even one of the words on the page.
YOWSA!
Today's theory - when it doesn't matter, anybody can get #1 in a second, if they know the on-page rules. BUT, after a certain "commercial competitive level", the "semantic analysis" algo kicks in, and less becomes more. The keyword density rules are flipped upon their noggins. I think that we are witnessing the evolution of search engine anti-seo sophistication, right before our very eyes. Fun stuff.