Monday, June 23, 2008

Webmaster Tools

Webmaster Tools

Our suite of webmaster tools provides you with a free and easy way to make your site more Google-friendly. They can show you Google’s view of your site, help you diagnose problems, and let you share info with us to help improve your site’s visibility.

Getting Google’s view of your site, and diagnosing potential problems
The first step to increasing your site’s visibility on Google is learning how our robots crawl and index your site.

Crawl info: You can make sure we have access to your site, and see when Googlebot last visited. You can also view URLs that we’ve had trouble crawling and why we couldn't crawl them. This way, you can fix any problems preventing us from indexing all of your pages.
Robots.txt file validation: See if we’re having trouble with your file, and test out changes to that file before you change it on your server.
Website content: View top content from your site and see the words that other sites use to link to it.

Seeing how your site performs
A second step is learning what drives traffic to your site.

Top queries: Find the top queries that drive traffic to your site and where your site is included in the top search results. This will let you learn how users are finding your site.
Indexing information: See how your site is indexed and which of your pages are included in the index. If we find violations in your site, we’ll give you the opportunity to fix the problems and request reinclusion of your site.

Sharing info with Google about your site
Since no one knows more about your site than you do, you can also share this info with Google and improve your crawlability.

Submit a Sitemap file: Tell us all about your pages by submitting a Sitemap file; help us learn which pages are most important to you and how often those pages change.
Specify your preferred domain: Tell us which URL to use when indexing your site; we’ll do our best to index the version you prefer.

What is a Sitemap file and why should I have one?

In general, there are two types of sitemaps. The first type of sitemap is a HTML page listing the pages of your site - often by section - and is meant to help users find the information they need.

XML Sitemaps - usually called Sitemaps, with a capital S - are a way for you to give Google information about your site. This is the type of Sitemap we'll be discussing in this article.

In its simplest terms, a Sitemap is a list of the pages on your website. Creating and submitting a Sitemap helps make sure that Google knows about all the pages on your site, including URLs that may not be discoverable by Google's normal crawling process.

Sitemaps are particularly helpful if:

Your site has dynamic content.
Your site has pages that aren't easily discovered by Googlebot during the crawl process - for example, pages featuring rich AJAX or Flash.
Your site is new and has few links to it. (Googlebot crawls the web by following links from one page to another, so if your site isn't well linked, it may be hard for us to discover it.)
Your site has a large archive of content pages that are not well linked to each other, or are not linked at all.

You can also use a Sitemap to provide Google with additional information about your pages, including:

How often the pages on your site change. For example, you might update your product page daily, but update your About Me page only once every few months.
The date each page was last modified.
The relative importance of pages on your site. For example, your home page might have a relative importance of 1.0, category pages have an importance of 0.8, and individual blog entries or product pages have an importance of 0.5. This priority only indicates the importance of a particular URL relative to other URLs on your site, and doesn't impact the ranking of your pages in search results.

Sitemaps provide additional information about your site to Google, complementing our normal methods of crawling the web. We expect they will help us crawl more of your site and in a more timely fashion, but we can't guarantee that URLs from your Sitemap will be added to the Google index. Sites are never penalized for submitting Sitemaps.

Google adheres to Sitemap Protocol 0.9 as defined by sitemaps.org. The Sitemap Protocol is a dialect of XML for summarizing Sitemap information that is relevant to web crawlers. Sitemaps created for Google using Sitemap Protocol 0.9 are therefore compatible with other search engines that adopt the standards of sitemaps.org.

While a standard Sitemap works for most sites, you can also create and submit specialized Sitemaps for certain types of content. These Sitemap formats are specific to Google and are not used by other search engines. They're a good way to give Google detailed information about specific content types. For example, publishers can use News Sitemaps to give Google information that can appear in Google News search results, such as publication date, keywords, and stock ticker symbol. Sitemap formats include:

What are Sitemaps?

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

Monday, June 16, 2008

Google Sitemap Generator

sitemap

What is "XML sitemap"?

By placing a formatted xml file with site map on your webserver, you enable Search Engine crawlers (like Google) to find out what pages are present and which have recently changed, and to crawl your site accordingly.

What is "Change frequency"?

This value indicates how frequently the content at a particular URL is likely to change.

What is "Last Modified"?

The time the URL was last modified. This information allows crawlers to avoid recrawling documents that haven't changed.
You can let the generator set this field from your server's response headers or to specify your own date and time.

What is "Priority"?

The priority of a particular URL relative to other pages on the same site. The value for this tag is a number between 0.0 and 1.0, where 0.0 identifies the lowest priority page(s) on your site and 1.0 identifies the highest priority page(s) on your site.
The default priority of a page is 0.5.

Sitemap generation

Here are 4 simple steps to get it done

1. Enter your full website URL and some optional parameters in the form below.

2. Press 'Start' button and wait until the site is completely crawled (the progress will be indicated)

3. You will see the generated sitemap details page, including number of pages, broken links list, XML file content and link to a compressed sitemap. Download the sitemap file using this link and put it into the "public_html/" folder of your site.

4. Go to your Google Webmaster account and add your sitemap URL.
Please check About Sitemaps for more details.

Starting URL

Please enter the full http address for your site, only the links within the starting directory will be included. For instance, "domain.com" and "www.domain.com" are not the same.

Change frequency

Last modification

None
Use server's response
Use this date/time:

Priority

Check your settings and click button below

Maximum 500 pages will be indexed in sitemap

Need to index more? Check our Standalone version of Google sitemap generator with unlimited number of pages for crawler.

Where to get your RSS Feed Icons …

Need a standard feed icon or a feed icon PSD so you can manipulate how the icon looks?

Mozilla Foundation has made RSS feed icons available to help feed links become more recognizable and standardized. Their guidelines and FAQ provide full information on the icon and how it should be used.

Though the icons are available on the Mozilla site via the links above, the best place to get the icons is feedicons.com. At feedicons.com there is a zip file with the icons in a number of different file formats, including Photoshop and Illustrator formats.

Where to submit your RSS feeds …

Have a blog or other site that outputs an RSS feed? Want more exposure for your site or feed? Masternewmedia has compiled and regularly updates a list of websites where RSS feeds can be submitted. Each link has been tested and there is information on the website and how it works, as well as a direct link to the feed submission page.

List of RSS Submission Sites

There is also a feed of these sites which updates as the list is updated, so you can submit to new sites as they are added …

Feed of RSS Submission Sites

Massive List of RSS Resources

Mashable has put together a huge list of RSS Tools which includes:

Windows RSS Readers
OS X RSS Readers
Linux RSS Readers
Web-based RSS Readers
Mobile RSS Readers
RSS-to-email tools
Feed Validators
RSS-related Firefox plugins
RSS plugins for Wordpress
RSS Managers
Tools for combining or mixing RSS feeds
RSS ping tools
RSS feed directories

WhatIsRSS.com now has a blog ...

We have always wanted to keep this resource brief and to the point, but we realise there is alot more that can be communicated about using RSS. Our RSS Blog was launched 26 July 2007 to extend and complement the information provided here. If you are interested in learning more about RSS go there now and subscribe! It will be updated over time with information on using RSS and will feature tools to help you use RSS in new and better ways.

What do I need to do to read an RSS Feed? RSS Feed Readers and News Aggregators

Feed Reader or News Aggregator software allow you to grab the RSS feeds from various sites and display them for you to read and use.

A variety of RSS Readers are available for different platforms. Some popular feed readers include Amphetadesk (Windows, Linux, Mac), FeedReader (Windows), and NewsGator (Windows - integrates with Outlook). There are also a number of web-based feed readers available. My Yahoo, Bloglines, and Google Reader are popular web-based feed readers.

Once you have your Feed Reader, it is a matter of finding sites that syndicate content and adding their RSS feed to the list of feeds your Feed Reader checks. Many sites display a small icon with the acronyms RSS, XML, or RDF to let you know a feed is available.

RSS Primer: One Page Quick Introduction to RSS

What is RSS?

RSS (Rich Site Summary) is a format for delivering regularly changing web content. Many news-related sites, weblogs and other online publishers syndicate their content as an RSS Feed to whoever wants it.

Why RSS? Benefits and Reasons for using RSS

RSS solves a problem for people who regularly use the web. It allows you to easily stay informed by retrieving the latest content from the sites you are interested in. You save time by not needing to visit each site individually. You ensure your privacy, by not needing to join each site's email newsletter. The number of sites offering RSS feeds is growing rapidly and includes big names like Yahoo News.

Friday, June 13, 2008

Robots.txt Checker

Robots.txt files (often erroneously called robot.txt, in singular) are created by webmasters to mark (disallow) files and directories of a web site that search engine spiders (and other types of robots) should not access.

This robots.txt checker is a "validator" that analyzes the syntax of a robots.txt file to see if its format is valid as established by Robot Exclusion Standard (please read the documentation and the tutorial to learn the basics) or if it contains errors.

	Simple usage: How to check your robots.txt file format? Just insert the full URL (Example: http://www.yourdomain.com/robots.txt) of the robots.txt file you want to analyze and hit Enter
	Powerful: The checker finds syntax errors, "logic" errors, mistyped words and it gives you useful optimization tips
	Accurate: The validation process takes in account both Robots Exclusion Standard rules and spider-specific (Google, Inktomi, etc.) extensions (including the new "Sitemap" command).

This robots.txt analyzer is provided by Motoricerca, a non-profit italian guide to web site optimization and search engine positioning.

Robots.txt full URL (Example: http://www.domain.com/robots.txt):

Robots Tester FAQ

Enter The Content Of The robots.txt To Check:

You can paste the contents of your robots.txt file here to test it without putting the file on your server:

Enter The URL Of The robots.txt To Check:

The URL Must End In '.txt' for this script to work

If the robots.txt tester helped you, please say 'thank you' with a link.

robots.txt Checker

Test The Syntax Of Your Robots File

This tester allows you to check your robots.txt file for syntax errors. There are 2 methods for putting your robots.txt contents into this script:

Method 1: If you already have a robots.txt file on your server, enter the URL of that file and this script will retrieve the content and test it.

Method 2: You can paste the contents of you robots.txt file into the text box below, this is an ideal way to test modifications before putting them on your server.

If you don't have a robots.txt file yet use our simple robots.txt creator to make one.

Using a robots.txt File

This is a useful file that keeps search engines from indexing pages you do not want spidered. Why would you not want a page indexed by a search engine? Perhaps you want to display a page that shows an example of spamming the search engines. This type of page might include an example of repeated keywords, hidden tags with keywords, and other things that could get a page or an entire site banned from a search engine.

The robots.txt file is a good way to prevent this page from getting indexed. However, not every site can use it. The only robots.txt file that the spiders will read is the one at the top html directory of your server. This means you can only use it if you run your own domain. The spiders will look for the file in a location similar to these below:

http://www.pageresource.com/robots.txt
http://www.javascriptcity.com/robots.txt
http://www.mysite.com/robots.txt

Any other location of the robots.txt file will not be read by a search engine spider, so the file locations below will not be worthwhile:

http://www.pageresource.com/html/robots.txt
http://members.someplace.com/you/robots.txt
http://someisp.net/~you/robots.txt

Now, if you have your own domain- you can see where to place the file. So let's take a look at exactly what needs to go into the robots.txt file to make the spider see what you want done.

If you want to exclude all the search engine spiders from your entire domain, you would write just the following into the robots.txt file:

User-agent: *
Disallow: /

If you want to exclude all the spiders from a certain directory within your site, you would write the following:

User-agent: *
Disallow: /aboutme/

If you want to do this for multiple directories, you add on more Disallow lines:

User-agent: *
Disallow: /aboutme/
Disallow: /stats/

If you want to exclude certain files, then type in the rest of the path to the files you want to exclude:

User-agent: *
Disallow: /aboutme/album.html
Disallow: /stats/refer.htm

If you are curious, here is what I used to keep an article from getting indexed:

User-agent: *
Disallow: /zine/article002.htm

If you want to keep a specific search engine spider from indexing your site, do this:

User-agent: Robot_Name
Disallow: /

You'll need to know the name of the search engine spider or robot, and place it where Robot_Name is above. You can find these names from the web sites of the various search engines.

So, if you need to exclude something from search engine indexing, this is the most effective tool recognized by the search engines- so use it to keep the spiders out of any part of your web you want them to avoid.

Why is a Robots.txt File Important?

What is the purpose of a robots.txt file?

It Can Avoid Wastage of Server Resources

At the date of this writing, as far as I know, many of the search engine spiders do not bother to index the scripts on your site (such as your CGI or PHP scripts). However, there are those that do, including one of the major players, Google.

For robots or spiders that actually index scripts, they will actually call your scripts just as a browser would, complete with all the special characters. If your site is like mine, where the scripts are solely meant for the use of humans and serve no practical use for a search engine (why should a search engine need to invoke my site-navigation script? - it can just crawl the direct links), you may want to block spiders from the directories that contain your scripts. For example, I block spiders from my CGI-BIN directory. Hopefully, this will reduce the load on the web server that occurs when scripts are executed by removing unnecessary executions.

Of course there are the occasional ill-behaved robots that hit your server at high speed. Such spiders can actually bring down your server or at the very least slow it down for the real users who are trying to access it. If you know of any such spiders, you might want to exclude them too. You can do this with a robots.txt file. Unfortunately though, ill-behaved spiders often ignore robots.txt files as well.
It Can Save Your Bandwidth

If you look at your website's web logs, you will undoubtedly find many requests for the robots.txt file by various search engine spiders. If, like me, you have a customized 404 document (which loads each time a visitor tries to retrieve a page that does not exist on your site), you will find that the robot will wind up requesting for that document instead, if you don't have an existing robots.txt file. My site has a fairly large 404 document, with the result that the spiders wind up loading it repeatedly throughout the day, adding to my already large bandwidth problems. In such a case, having a small robots.txt file may save you some bandwidth (yeah, I know, it's not that much).

Some spiders may also request for files which you feel they should not. For example, one search engine requests for graphic files (".gif" files") on my sites. Since I see little reason why I should let it index the graphics on my site, waste my bandwidth, and possibly infringe my copyright, I ban it (and in fact all spiders) from my graphic files directory in my robots.txt file.
It Removes Clutter from your Web Statistics

I don't know about you, but one of the things I check from my web statistics is the list of URLs that visitors tried to access, but met with a 404 File Not Found Error. Often this tells me if I made a spelling error in one of the internal links on one of my sites (yes, I know - I should have checked all links in the first place, but mistakes do happen).

If you don't have a robots.txt file, you can be sure that /robots.txt is going to feature in your web statistics 404 report, adding clutter and perhaps unnecessarily distracting your attention from the real bad URLs that need your attention.
Refusing a Robot

Sometimes you don't want a particular spider to index your site for some reason or other. Perhaps the robot is ill-behaved and spiders your site at such a high speed that it takes down your entire server. Or perhaps you prefer that you don't want the images on your site indexed in an image search engine. With a robots.txt file, you can exclude certain spiders from indexing your site with a robots.txt directive, provided the spider obeys the rules in that file.

How to Set Up a Robots.txt File

Writing a robots.txt file could not be easier. It's just an ASCII text file that you place at the root of your domain. For example, if your domain is www.yourdomain.com, you will place the file at www.yourdomain.com/robots.txt.

The file basically lists the names of spiders on one line, followed by the list of directories or files it is not allowed to access on subsequent lines, with each directory or file on a separate line. It is possible to use the wildcard character "*" instead of naming specific spiders. When you do so, all spiders are assumed to be named. Note that the robots.txt file is a robots exclusion file (with emphasis on the "exclusion") - there is no way to tell spiders to include any file or directory.

Take the following robots.txt file for example:

 User-agent: *
Disallow: /cgi-bin/ 

The above two lines, when inserted into a robots.txt file, inform all robots (since the wildcard asterisk "*" character was used) that they are not allowed to access anything in the cgi-bin directory and its descendents. That is, they are not allowed to access cgi-bin/whatever.cgi or even a file or script in a subdirectory of cgi-bin, such as /cgi-bin/anything/whichever.cgi.

If you have a particular robot in mind, such as the Google image search robot, which collects images on your site for the Google Image search engine, you may include lines like the following:

 User-agent: Googlebot-Image
Disallow: / 

This means that the Google image search robot, "Googlebot-Image", should not try to access any file in the root directory "/" and all its subdirectories. This effectively means that it is banned from the entire of your website.

You can have multiple Disallow lines for each user agent (ie, for each spider). Here is an example of a longer robots.txt file:

 User-agent: *
Disallow: /images/
Disallow: /cgi-bin/

User-agent: Googlebot-Image
Disallow: / 

The first block of text disallows all spiders from the images directory and the cgi-bin directory. The second block of code disallows the psbot spider from every directory.

It is possible to exclude a spider from indexing a particular file. For example, if you don't want Google's image search robot to index a particular picture, say, mymugshot.jpg, you can add the following:

 User-agent: Googlebot-Image
Disallow: /images/mymugshot.jpg 

Remember to add the trailing slash ("/") if you are indicating a directory. If you simply add

 User-agent: *
Disallow: /privatedata 

the robots will be disallowed from accessing privatedata.html as well as privatedataandstuff.html as well as the directory tree beginning from /privatedata/ (and so on). In other words, there is an implied wildcard character following whatever you list in the Disallow line.

Where Do You Get the Name of the Robots?

If you have a particular spider in mind which you want to block, you have to find out its name. To do this, the best way is to check out the website of the search engine. Respectable engines will usually have a page somewhere that gives you details on how you can prevent their spiders from accessing certain files or directories.

Common Mistakes in Robots.txt

Here are some mistakes commonly made by those new to writing robots.txt rules.

It's Not Guaranteed to Work

As mentioned earlier, although the robots.txt format is listed in a document called "A Standard for Robots Exclusion", not all spiders and robots actually bother to heed it. Listing something in your robots.txt is no guarantee that it will be excluded. If you really need to protect something, you should use a .htaccess file to password-protect the directory (if you are running your site on an Apache server).
Don't List Your Secret Directories

Anyone can access your robots file, not just robots. For example, typing http://www.google.com/robots.txt will get you Google's own robots.txt file. I notice that some new webmasters seem to think that they can list their secret directories in their robots.txt file to prevent that directory from being accessed. Far from it. Listing a directory in a robots.txt file often attracts attention to the directory. In fact, some spiders (like certain spammers' email harvesting robots) make it a point to check the robots.txt for excluded directories to spider.
Only One Directory/File per Disallow line

Don't try to be smart and put multiple directories on your Disallow line. This will probably not work the way you think, since the Robots Exclusion Standard only provides for one directory per Disallow statement.

It's Worth It

Even if you want all your directories to be accessed by spiders, a simple robots file with the following may be useful:

 User-agent: *
Disallow: 

With no file or directory listed in the Disallow line, you're implying that every directory on your site may be accessed. At the very least, this file will save you a few bytes of bandwidth each time a spider visits your site (or more if your 404 file is large); and it will also remove Robots.txt from your web statistics bad referral links report.

Copyright 2001-2008 by Christopher Heng. All rights reserved.
Get more free tips and articles like this, on web design, promotion, revenue and scripting, from http://www.thesitewizard.com/

If you find this article useful, please consider making a donation.

You are here: Top > Website Promotion and Search Engine Optimization > How to Set Up a robots.txt to Control Search Engine Spiders

thesitewizard™ News Feed (RSS Site Feed)

Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at http://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.

Do Not Reprint Without Permission

This article is copyrighted. Please do not reproduce this article in whole or part, in any form, without obtaining my written permission.

New Pages

Popular Pages

How to Link to This Page

To link to this page from your website, simply cut and paste the following code to your web page.

It will appear on your page as:

How to Set Up a robots.txt to Control Search Engine Spiders

http://www.thesitewizard.com/archive/robotstxt.shtml
by Christopher Heng, thesitewizard.com

When I first started writing my first website, I did not really think that I would ever have any reason why I would want to create a robots.txt file. After all, did I not want search engine robots to spider and thus index every document in my site? Yet today, all my sites, including thesitewizard.com, have a robots.txt file in their root directory. This article explains why you might also want to include a Robots.txt file on your sites, how you can do so, and notes some common mistakes made by new webmasters with regards the ROBOTS.TXT file.

For those new to the robots.txt file, it is merely a text file implementing what is known as the Standard for Robot Exclusion. The file is placed in the main directory of a website that advises spiders and other robots which directories or files they should not access. The file is purely advisory - not all spiders bother to read it let alone heed it. However, most, if not all, the spiders sent by the major search engines to index your site will read it and take cognizance of the rules contained within the file.

Robots Text File - robots.txt

The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. For those spiders that obey the file, it provides a map for what they can, and cannot index. The file must reside in the root directory of your web. The URL path (web address) of your robots.txt file should look like this...

/robots.txt

The Robots text file open in Notepad might look like this:

This is a screen shot of an empty robots.txt file
Screen Shot - Robots Text File

Definition of the above robots.txt file:

User-agent: *
The asterisk (*) or wildcard represents a special value and means any robot.

Disallow:
The Disallow: line without a / (forward slash) tells the robots that they can index the entire site.

Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record without the / (forward slash) as shown above.

The presence of an empty "/robots.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all robots will consider themselves welcome.

The Disallow: line without the trailing slash (/) tells all robots to index everything. If you have a line that looks like this:

Disallow: /private/

It tells the robot that it cannot index the contents of that /private/ directory.

Summarizing the Robots Exclusion Protocol - robots.txt file

To allow all robots complete access:

User-agent: *
Disallow:

This is a screen shot of an empty robots.txt file
Screen Shot - Robots Text File

To exclude all robots from the server:

User-agent: *
Disallow: /

To exclude all robots from parts of a server:

User-agent: *
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

To exclude a single robot from the server:

User-agent: Named Bot
Disallow: /

To exclude a single robot from parts of a server:

User-agent: Named Bot
Disallow: /private/
Disallow: /images-saved/
Disallow: /images-working/

Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any Robot" and therefore is the only one needed until you fully understand how to set up different User-agents.

If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:

Disallow: /private/top-secret-stuff.htm

Keep in mind that using the above example excludes that specified page (top-secret-stuff.htm) but will not exclude the entire /private/ directory.

You should validate your robots.txt file. Enter the full URI to the robots.txt file on your server. The robots.txt file always resides at the root level of your web.

Here are a few good online references for information on the Robots Exclusion Protocol.

A New Concept in Marketing - The Bot Blog

Brett Tabke - WebmasterWorld - The Bot Blog
Robots.txt - Where no blog has gone before. Brett may have started a new marketing medium, blogging via your robots.txt file. Only a true geek would appreciate The Bot Blog!

Thursday, June 12, 2008

On-Page Factors

On-Page Factors

As Search Engine Optimisation experts, Rupiz Media deals in perfecting on-page factors that influence the rankings of your web pages. Catering to a range of services in this regard, we provide viable solutions to web businesses of all kinds.

As part of on-page optimisation, we take care of the following aspects of a website:

Keyword Research
We undertake extensive research of the web to enlist the most important keywords and phrases for your website. These keywords eventually form the base of your search engine marketing campaign.

Content Review & Optimisation
We review the content of your site and suggest changes wherever need be. These changes include judicious sprinkling of your web pages with essential keywords, so as to foster the process of search engine indexing. Along with that, we also ensure that the content is written in simple, marketing-oriented language that can attract visitors.

Competitor Analysis
Rupiz Media studies the online marketplace for you. We scan the websites of your competitors in order to analyse the reasons for their success or failure in various optimisation techniques. This helps us formulate strategies that are advantageous for your business.

Meta Tags Optimisation
A part of your site's HTML programming, meta tags are distinctive to each and every web page. We optimise the Title, Description, Keyword, and other meta tags in a way that they comprise your main keywords and assist in faster indexing of your pages.

URL Optimisation
We optimise the URLs of your web pages in order to make them more conducive for indexing. This process also involves the use of pertinent keywords that can attract search engine spiders.

Image Optimisation
In case your website has images, we optimise their Alt Tags and package them in a way that they do not impede the process of optimisation.

Site Navigation
When it comes to structuring and internal linking of your website, we give due consideration to search engine friendliness to render an eye-catching navigational constitution to your pages.

More On page facters

On-Page factors are related directly to the content and structure of the website. This normally consists of pages written in the HyperText Markup Language but also applies to other document formats that are indexed by search engines, for example Microsoft Word or PDF formats. On-page optimization involves modifying keyword frequency in the URL, Title, Headings, Hypertext Links and Body text. It may also involve reducing redundant HTML codes (aka cruft) produced by Web page authoring tools and restructuring the site to produce better linked and focussed page content.

Many search engines now discount the weight given to on-page factors because they give too much scope for abuse by SEO experts. In theory the visible parts of a web-page are less prone to manipulation as they have to make sense to readers. However doorway pages with redirections and clever use of style sheets enable different content to be served to search engines and end users.

Each page should target between two and four keywords directly related to the contents. If you feel the need for more keywords then consider splitting your content into separate pages. The Uniform Resource Locator (URL) should contain keywords, separated by hyphens without being too long, around 128 characters is probably a sensible upper limit for the entire URL. The Title tag should contain the keywords with no stop words but arranged to make sense.

On Page Optimization

This should be the first tag in the Head section of the page. There is evidence that search engines give more weight to factors higher up the page. The content should be properly structured with the use of Heading (H1, H2, H3 etc) tags containing relevant keywords. Search-engines will only index a limited amount of text in HTML tags and using too many keywords will dilute the focus. Don't spam any of these tags, this won't be effective and could result in a penalty.

Many website designers spend a lot of time creating Keyword and Description meta tags. Although these may be read by search engines, for example the description tag is used by Yahoo! to provide a short description of the site in the Search Engine Results Pages, they are not used for ranking pages.

Personally I don't bother with them as they bulk out pages for little real benefit. Both Google, Yahoo! and MSN Search will use the text they find on the page as a description so make sure your first header and sentence describe the contents. However some search engine watchers say that the new Microsoft search engine, currently in beta tests, puts some weight on meta-tags. There is also evidence to suggest that search engines give more prominence earlier in the page and some engines will only index a limited amount of body text so making the first paragraph punchy is a good idea.

Image alternate-text tags (ALT tags) are only indexed where the image is part of a hyperlink. However ALT tags are useful for non-graphical browsing and should be employed correctly.

Description of Image

Comments are not indexed. Use bold/strong/italic attributes where appropriate.

Write natural copy aimed at the end user and not search engines. Don't worry too much about keyword density for the contents but take the opportunity to include keywords combined in different phrases and orders and create anchor text to related internal pages. Keep the number of links to fewer than 50, and probably less and don't repeat identical outbound-links. Theme related pages should be at the same level in the site hierarchy and be linked through the site's menu structure and site map. At least one page at the same level should link back to the home page so that search engines that have traversed a deep-link can index the rest of the website.

For any other document format, e.g. PowerPoint, Adobe PDF etc make sure you at least have a descriptive document title. Try to avoid formats that search engines find hard to understand, even where a search engine can index a format it will carry less information than plain old HTML. Avoid using images to replace text, except occasionally in hyperlinks. Avoid formats such as flash, shockwave and sitemaps where there is no alternative text. Avoid HTML Frames which some search engines find hard to navigate, use Style Sheets (CSS) instead. Style Sheets should also be used to reduce the amount of formatting within documents. Keep pages to less than 100 kilobytes and preferably not much more than a screen full of text. Where Javascript or Flash menus are used include plain-text links at the bottom of the page. These will ensure all search engines index the rest of your website.

Other factors directly under the control of the website is the amount of content. Large websites generally rank better than small websites for a number of reason. Search engines also like fresh content and will spider this more frequently. A regularly updated news page, even a blog, can provide deep links to the rest of the website.

"ON PAGE FACTORS" ---

1. Alleged POSITIVE ON-Page SEO Google Ranking Factors (38)
(Keeping in mind the converse, of course, that when violated, some of these factors
immediately jump into the NEGATIVE On-Page Ranking Factors domain.)

The term "Keyword" below refers to the "Keyword Phrase", which can be one word or more.
Green rows confirmed by Google patent - updated 08-10-06

Note - Patent Claim #	Factor #	POSITIVE ON-Page SEO Factors	Brief Note
50		KEYWORDS	Google patent - Topic extraction For keyword selection, try Overture - Google Ad Words - Google Trends
HOT	1	Keyword in URL	First word is best, second is second best, etc.
HOT	2	Keyword in Domain name	Same as in page-name-with-hyphens
		Keywords - Header
HOT	3	Keyword in Title tag	Keyword in Title tag - close to beginning Title tag 10 - 60 characters, no special characters.
-	4	Keyword in Description meta tag	Shows theme - less than 200 chars. Google no longer "relies" upon this tag, but will often use it.
-	5	Keyword in Keyword metatag	Shows theme - less than 10 words. Every word in this tag MUST appear somewhere in the body text. If not, it can be penalized for irrelevance. No single word should appear more than twice. If not, it may be considered spam. Google purportedly no longer uses this tag, but others do.
		Keywords - Body
-	6	Keyword density in body text	5 - 20% - (all keywords/ total words) Some report topic sensitivity - the keyword spamming threshold % varies with the topic.
-	7	Individual keyword density	1 - 6% - (each keyword/ total words)
HOT	8	Keyword in H1, H2 and H3	Use Hx font style tags appropriately
-	9	Keyword font size	"Strong is treated the same as bold, italic is treated the same as emphasis" . . . Matt Cutts July 2006
-	10	Keyword proximity (for 2+ keywords)	Directly adjacent is best
-	11	Keyword phrase order	Does word order in the page match word order in the query? Try to anticipate query, and match word order.
-	12	Keyword prominence (how early in page/tag)	Can be important at top of page, in bold, in large font
		Keywords - Other
-	13	Keyword in alt text	Should describe graphic - Do NOT fill with spam (Was part of Google Florida OOP - tripped a threshold - may still be in effect to some degree as a red flag, when summed with all other on-page optimization - total page optimization score - TPOS).
-	14	Keyword in links to site pages (anchor text)	Links out anchor text use keyword?
		NAVIGATION - INTERNAL LINKS
SITE	15	To internal pages- keywords?	Link should contain keywords. The filename "linked to" should contain the keywords. Use hyphenated filenames, but not long ones - two or three hyphens only.
SITE	16	All Internal links valid?	Validate all links to all pages on site. Use a free link checker. I like this one.
SITE	17	Efficient - tree-like structure	TRY FOR two clicks to any page - no page deeper than 4 clicks
SITE	18	Intra-site linking	Appropriate links between lower-level pages
54		NAVIGATION - OUTGOING LINKS
55	19	To external pages- keywords?	Google patent - Link only to good sites. Do not link to link farms. CAREFUL - Links can and do go bad, resulting in site demotion. Unfortunately, you must devote the time necessary to police your outgoing links - they are your responsibility.
56	20	Outgoing link Anchor Text	Google patent - Should be on topic, descriptive
61, 62	21	Link stability over time	Google patent - Avoid "Link Churn"
-	22	All External links valid?	Validate all links periodically.
-	23	Less than 100 links out total	Google says limit to 100, but readily accepts 2-3 times that number. ref 2k
		OTHER ON-Page Factors
-	24	Domain Name Extension Top Level Domain - TLD	.gov sites seem to be the highest status .edu sites seem to be given a high status .org sites seem to be given a high status .com sites excel in encompassing all the spam/ crud sites, resulting in the need for the highest scrutiny/ action by Google. Perhaps one would do well with the new .info domain class. - ~~Nope. Spammers jumped all over it - no safe haven there.~~ Not so much, now - .info sites can rank highly.
-	25	File Size	Try not to exceed 100K page size (however, some subject matter, such as this page, requires larger file sizes). Smaller files are preferred <40k>
	26	Hyphens in URL	Preferred method for indicating a space, where there can be no actual space One or two= excellent for separating keywords (i.e., pet-smart, pets-mart) Four or more= BAD, starts to look spammy Ten = Spammer for sure, demotion probable?
6, 7 12, 13	27	Freshness of Pages	Google patent - Changes over time Newer the better - if news, retail or auction! Google likes fresh pages. So do I.
8, 9	28	Freshness - Amount of Content Change	New pages - Ratio of old pages to new pages
27	29	Freshness of Links	Google patent - May be good or bad Excellent for high-trust sites May not be so good for newer, low-trust sites
-	30	Frequency of Updates	Frequent updates = frequent spidering = newer cache
-	31	Page Theming	Page exhibit theme? General consistency?
-	32	Keyword stemming	Stem, stems, stemmed, stemmer, stemming, stemmist, stemification
-	33	Applied Semantics	Synonyms, CIRCA white paper
-	34	LSI	Latent Semantic Indexing - Speculation, no proof
-	35	URL length	Keep it minimized - use somewhat less than the 2,000 characters allowed by IE - less than 100 is good, less is even better
		OTHER ON-SITE Factors
5	36	Site Size - Google likes big sites	Larger sites are presumed to be better funded, better organized, better constructed, and therefore better sites. Google likes LARGE sites, for various reasons, not all positive. This has resulted in the advent of machine-generated 10,000-page spam sites - size for the sake of size. Google has caught on and dumped millions of pages, or made them supplemental.
4	37	Site Age	Google patent - Old is best. Old is Golden.
3	38	Age of page vs. age of site	Age of page vs. age of other pages on site Newer pages on an older site will get faster recognition.
	Note: For ALL the POSITIVE On-Page factors listed above, PAGE RANK can OVERRIDE them all. So can Google-Bombing. Top of page

2. Alleged Negative ON-Page SEO Google Ranking Factors (24)

Note	Factor #	NEGATIVE ON-Page SEO Factors	Brief Note
BAD	39	Text presented in graphics form only No ACTUAL body text on the page	Text represented graphically is invisible to search engines.
BAD	40	Affiliate site?	The Florida update went after affiliates with a vengeance - flower and travel affiliates were hit hard - cookie-cutter sites with massive inter-linking, but little unique content. Subsequent updates have also targeted affiliates.
BAD	41	Over optimization penalty (OOP)	Penalty for over-compliance with well-established, accepted web optimization practices. Too high keyword repetition (keyword stuffing) may get you the OOP. Overuse of H1 tags has been mentioned. Meta-tag stuffing.
BAD	42	Link to a bad neighborhood	Don't link to link farms, FFAs (Free For All's) Also, don't forget to check the Google status of EVERYONE you link to periodically. A site may go "bad", and you can end up being penalized, even though you did nothing. For instance, some failed real estate sites have been switched to p0rn by unscrupulous webmasters, for the traffic. This is not good for you, if you are linking to the originally legitimate URL.
BAD	43	Redirect thru refresh metatags	Don't immediately send your visitor to another page other than the one he/ she clicked on, using meta refresh.
BAD	44	Vile language - ethnic slur	Including the George Carlin 7 bad words you can't say on TV, plus the 150 or so that followed. Don't shoot yourself right straight in the foot. Also, avoid combinations of normal words, which when used together, become something else entirely - such as the word juice, and the word l0ve. See why I wrote that zero? I don't even want to get a proximity penalty, either. Paranoia, or caution? You decide. I always want to try to put my "best foot forward".
BAD	45	Poison words	The word "Links" in a title tag has been suggested to be a bad idea. Here is my list of Poison Words for Adsense. This penalty has been loosened - many of these words now appear in normal context, with no problems. But watch your step.
BAD	46	Excessive cross-linking	- within the same C block (IP=xxx.xxx.CCC.xxx) If you have many sites (>10, author's guess) with the same web host, prolific cross-linking can indicate more of a single entity, and less of democratic web voting. Easy to spot, easy to penalize. "This does not apply to a small number of sites" .. (this author guesses the number 10, JAWG) . . . "hosted on a local server". . Matt Cutts July 2006
BAD	47	Stealing images/ text blocks from another domain	Copyright violation - Google responds strongly if you are reported. ref egol File Google DMCA
BAD	48	Keyword stuffing threshold	In body, meta tags, alt text, etc. = demotion
??	49	Keyword dilution	Targeting too many unrelated keywords on a page, which would detract from theming, and reduce the importance of your REALLY important keywords.
??	50	Page edit - can reduce consistency	Google patent - Google is now switching between a "newer" cache, and several "older" caches, frequently drawing from BOTH at the same time. This was possibly implemented to frustrate SERP manipulators. Did your last edit substantially alter your keywords, or theme? Expect noticeable SERP bouncing.
6 - 7	51	Frequency of Content Change	Google patent - Too frequent = bad
32, 33	52	Freshness of Anchor Text	Google patent - Too frequent = bad
??	53	Dynamic Pages	Problematic - know pitfalls - shorten URLs, reduce variables (". . no more than 2 or 3", M.Cutts July 2006), lose the session IDs
??	54	Excessive Javascript	Don't use for redirects, or hiding links
??	55	Flash page - NOT	Most (all-?) SE spiders can't read Flash content Provide an HTML alternative, or experience lower SERP positioning.
??	56	Use of Frames	Spidering Problems with Frames - STILL
-	57	Robot exclusion "no index" tag	Intentional self-exclusion
-	58	Single pixel links	A red flag - one reason only - a sneaky link.
-	59	Invisible text	OK - No penalty - Google advises against this. ~~All over the place - but nothing is ever done~~. (The text is the same color as the background, and hence cannot be seen by the viewer, but can be visible to the search engine spiders.) I believe Google does penalize for hidden text, since it is an attempt to manipulate rank. Although they don't catch everyone.
-	60	Gateway, doorway page (I see changes here - not only does the doorway page disappear, but the main page gets pushed down, as well - this is a welcome fix.)	~~OK - No~~ penalty - Google advises against this. Google used to reward these pages. ~~Multiple entrance pages in the top ten SERPs - I see it daily. There they are at #2, with their twin at #5 - 6 months now. Reported numerous times.~~
-	61	Duplicate content (YOUR'S) ~~Duplicate content (THEIR'S) below (Highjack)~~	OK - No penalty - Google advises against this. Google picks one (usually the oldest), and shoves it to the top, and pushes the second choice down. This has been a big issue with stolen content - the thief usurps your former position with YOUR OWN content.
-	62	HTML code violations (The big G does not even use DOCTYPE declarations, required for W3C validation.)	Doesn't matter - Google advises against this. Unless of course, the page is totally FUBAR. Simple HTML verification is NOT required (but advised, since it could contribute to your page quality factor - PQF).
-		Since the above 4 items are so controversial, I would like to add this comment: There are many things that Google would LIKE to have webmasters do, but that they simply cannot control, due to logistical considerations. Their only alternative is to foment fear and doubt by implying that any violation of their "suggestions" will result in swift and fierce demotion. (This is somewhat dated - G is fixing these things.)	IN GENERAL, this works pretty well to keep webmasters in line. The fallacy of this is that attentive webmasters can readily observe continuing, blatant exceptions to these official pronouncements. There are many anecdotes about Goggle "taking care" of a problem. Google states that they do not provide hand-tweaked "boosts", but are silent about hand-tweaked demotions. They occur, for sure. To believe otherwise is naive. Wouldn't YOU swat the most obnoxious flies? I would. It is becoming easier to determine the best thing to do. Try to avoid any Google penalties or demotions.
-	119	Phrase-based ranking, filters, penalties	Feb. 2007 - Google patent granted. Do not use phrases that have been associated and correlated with known spamming techniques, or you will be penalized. What phrases? Ahh, you tell me.
Top of page

3. Alleged POSITIVE OFF-Page SEO Google Ranking Factors (43)

Note	Factor #	POSITIVE OFF-Page SEO Factors	Brief Note
		INCOMING LINKS :
HOT	63	Page Rank	Based on the Number and Quality of links to you Google link reporting continues to display just a SMALL fraction of your actual backlinks, and they are NOT just greater than PR4 - they are mixed.
-	64	Total incoming links ("backlinks")	Historically, FAST counted best (www.alltheweb.com). No more - Yahoo (parent) broke it. In Yahoo search, type in: linksite:www.domain-name.com linkdomain:www.domainname.com Try MSN - http://beta.search.msn.com Use link:www.domainname.com Current TYPICAL Backlink Reporting Ratios - Google - 30 links MSN - 1,000 links Yahoo - 3,000 links
-	65	Incoming links from high-ranking pages	In 2004, Google used to count (report) the links from all PR4+ pages that linked to you. In 2005-2006, Google reported only a small fraction of the links, in what seemed like an almost random manner. In Feb. 2007, Google markedly upgraded (increased) the number of links that they report.
-	66	Acceleration of link popularity (". . . used to be a good thing" ... Martha)	Google patent Link acquisition speed boost - speculative Too fast = artificial? Cause of -30 penalty? Sandbox penalty imposed if new site?
		FOR EACH INCOMING LINK :
-	67	Page rank of the referring page	Based on the quality of links to you
HOT	68	Anchor text of inbound link to you	Contains keyword, key phrase? #1 result in SERP does NOT EVEN need to have the keyword(s) on the page, ANYWHERE!!! What does that tell you? (Enables Google-bombing - search for "miserable failure")
	69	Age of link	Google patent - Old = Good.
	70	Frequency of change of anchor text	Google patent - Not good. Why would you do that?
	71	Popularity of referring page	Popularity = desirability, respect
-	72	# of outgoing links on referrer page	Fewer is better - makes yours more important
-	73	Position of link on referrer page	Early in HTML is best
-	74	Keyword density on referring page	For search keyword(s)
-	75	HTML title of referrer page	Same subject/ theme?
28	76	Link from "Expert" site?	Google patent - Big time boost (Hilltop Algorithm) Recently reported to give a big boost !
-	77	Referrer page - Same theme	From the same or related theme? BETTER
-	78	Referrer page - Different theme	From different or unrelated theme? WORSE
-	79	Image map link?	Problematic?
-	80	Javascript link?	Problematic- attempt to hide link?
		DIRECTORIES :
HOT	81	Site listed in DMOZ Directory? The "Secret Hand" DMOZ Issues 1. Legitimate sites CAN'T GET IN 2. No Accountability 3. Corrupt Editors 4. Competitive Sites Barred 5. Dirty Tricks Employed 6. Rude dmoz editors Flawed concept - communism doesn't work Free editing? Nothing is free. DMOZ Sucks Discussions DMOZ Problems Discussions The Google Directory is produced by an unknown, ungoverned, unpoliced, ill-intentioned, retaliatory, monopoly enterprise, consisting of profiteering power-ego editors feathering their own nests - the ODP. AOL is making millions, and needs to police it's run-amok entity. Enough already!	This is a tough one. Google's directory comes STRAIGHT from the DMOZ directory. You should try to get into dmoz. But you can't. Be careful whom you approach with the old spondulix - Formal DMOZ Bribe Instructions. It is almost impossible to get into DMOZ. ~~This site cannot get in, after waiting over 2 YEARS (33 months). Not even in the lowest, most insignificant category, "Personal Pages".~~ ~~I guess I just don't "measure up" to the other 20,000+ sites in the personal category.~~ I'm not the suck-up type - I kissed them off long ago. What a waste of time! UPDATE: This page (not site) finally got indexed in June 2007, thanks to a legitimate editor. No money was paid. Google needs to DO SOMETHING about populating its own directory with the skewed, incomplete, poorly determined results from the dysfunctional Open Directory Project - the ODP! Absolute Power Corrupts Absolutely
-	82	DMOZ category?	Theme fit category? General or geographic category? Both are possible, and acceptable.
HOT	83	Site listed in Yahoo Directory?	Big boost - You can get in by paying $299 each year. Many swear it is worth it - many swear it isn't.
-	84	Site listed in LookSmart Directory?	Boost? Another great vote for your site.
	85	Site listed in inktomi?	Inktomi has been absorbed internally by Yahoo.
-	86	Site listed in other directories (About, etc.)	Directory listing boost (If other RESPECTED directories link to you, this must be positive.)
-	87	Expert site? (Hilltop or Condensed Hilltop)	Large-sized site, quality incoming links
HOT	88	Site Age - Old shows stability	Google patent Boost for long-established sites, new pages indexed easily The opposite of the sand box.
-	89	Site Age - Very New Boost	Temporary boost for very new sites - I estimate that this boost lasts from 1 week to 3 weeks - Yahoo does it too.
-	90	Site Directory - Tree Structure	Influences SERPs - logical, consistent, conventional
-	91	Site Map and more site map	Complete - keywords in anchor text
-	92	Site Size	Previously, many pages preferred - conferred authority upon site, thus page. Bigger sites = better SERPs Now, fewer pages preferred, due to proliferation of computer-generated pages. Google has been dropping pages like crazy.
-	93	Site Theming	Site exhibit theme? Use many related terms? Have you used a keyword suggestion tool? A thesaurus?
		PAGE METRICS - USER BEHAVIOR:	Currently implemented through the Google tool bar?
34, 35	94	Page traffic	Google patent - # of visitors, trend
15,16,21	95	Page Selection Rate - CTR	Google patent - How often is a page clicked on?
36, 37	96	Time spent on page	Google patent - Relatively long time = indicates relevance hit
45, 46	97	Did user Bookmark page?	Google patent - Bookmark = Good
47	98	Bookmark add/ removal frequency	Google patent - Recent = Good?
	99	How they left, where they went	Back button, link clicked, etc.
		SITE METRICS - USER BEHAVIOR :	Currently implemented through the Google tool bar?
34, 35	100	Site Traffic	Google patent - # of visitors, increasing trend = good
	101	Referrer	Authoritative referrer?
	102	Keyword	Keyword searches used to find you
-	103	Time spent on domain	Relatively long time = indicates relevance hit Add brownie points.
38		DOMAIN OWNER BEHAVIOR :
40	104	Domain Registration Time	Google patent - Domain Expiration Date Register for 5 years, Google knows you are serious. Register for 1 year, is it a throw-away domain?
39	105	Are associated sites legitimate?	Google patent - No spam, ownership, etc.
Top of page

4. Alleged NEGATIVE OFF-Page SEO Google Ranking Factors (13)

Note	Factor #	NEGATIVE OFF-Page SEO Factors	Brief Note
-	120 (added)	Traffic Buying	Have you paid a company for web traffic? It is probably low quality traffic, with a zero conversion rate. Some providers of traffic for traffic's sake may be considered "bad neighborhoods". Can Google discount your traffic (for true popularity), because they know it's mostly phony? Have you read about Traffic Power?
22-29	106	Temporal Link Analysis	In a nut shell, old links are valued, new links are not. This is intended to thwart rapid incoming link accumulation, accomplished through the tactic of link buying. Just one of the sandbox factors.
18	107	Change of Meanings	Query meaning changes over time, due to current events
BAD	108	Zero links to you	You MUST have at least 1 (one) incoming link (back link) from some website somewhere, that Google is aware of, to REMAIN in the index.
BAD	109	Link-buying (Very good IF you don't get caught, but don't do it - when caught, the penalty isn't worth it.)	Google patent - Google hates link-buying, because it corrupts their PR model in the worst way possible. 1. Does your page have links it really doesn't merit? 2. Did you get tons of links in a short time period? 3. Do you have links from high-PR, unrelated sites?
41, 42	110	Prior Site Ranking	Google patent - High = Good
BAD	111	Cloaking	Google promises to Ban! (Presenting one webpage to the search engine spider, and another webpage to everybody else.)
??	112	Links from bad neighborhoods, affiliates	Google says that incoming links from bad sites can't hurt you, because you can't control them. Ideally, this would be true. However, some speculate otherwise, esp., when other associated factors are thrown into the mix, such as web rings.
BAD	113	Penalties - resulting from Domain Hijacking (work with Google to fix)	Should result in IMPRISONMENT, forthwith! Grand Theft, mandatory minimum sentence. The criminal COPIES your entire website, and HOSTS it elsewhere, with . . . a few changes.
-	114	Penalty - Google TOS violation	WMG is the worst offender - gobbles up tons of Google server time by nervous Nellie webmasters. Google even mentions them by name. I think that Google will spank you when you cross the threshold, of say, 100 queries per day for the same term, from the same IP. Google can block your IP. Get a Google API.
??	115	Server Reliability - S/B >99.9%	What is your uptime? Ever notice a daily time when your server is unavailable, like about 1:30 AM? How diligent must Googlebot be? This is the worst reason to get dropped - you just aren't there! An ISP maintenance interruption can cause delisting..
-	116	~~No more room~~ Pages being dropped from large sites	~~The 2³² problem - Google has hit the~~ ~~4.3 Gigabyte address space wall~~. Bull! Google now has over 8 Gigs of indexed pages. Thousands of pages are disappearing from various huge websites, but I think that it is G just cleaning house, by dumping computer-generated pages.
HOT	117	Rank Manipulation by Competitor Attack (1. Content theft causing you to get a duplicate content penalty, even though your content is the original - ~~Google has problems tracking original authorship~~. People are still stealing my content, but nobody trumps me (in Google) with my own content - hats off to Google.) Examples - Site-Wide Link Attack and 302 Redirect Attack and Hijacker Attack	Impossible by Google definition (except for a few nasty tricks, like making your competition appear to be link spammers) Ideally, there SHOULD be nothing that your competition can do to directly hurt your rankings. However, an astute observer noticed that Google changed their website to read : Old verbiage = "There is nothing a competitor can do to harm your ranking ..." New verbiage = "There is ALMOST nothing a competitor can do ..." An obvious concession that Google thinks that at least some dirty tricks work! Of course, there will always be new ones!
-	118	Bouncing Ball Algorithm	~~At least 2, and often 3 identifiable Google Search Algos are currently in use, alternating pseudo-randomly through the data centers.~~ G has moved to a daily dance. Multiple changing factors are applied daily. GOOD LUCK NOW on trying to figure things out! IN ADDITION, some the above factors are being "tweaked" daily. Not only are the "weights" of the factors changed, but the formula itself changes. Change is the only constant. An algo change can boost or demote your site. I put this in the negative factors section, because your position is never secure, unless of course, you are huge (PR=7 or greater). If you simply cannot achieve top position, your only alternative to first page SERP exposure may be Google Ad Words (you pay for exposure). Today, I searched for an extremely competitive "2-word term", and I found that NOT ONE of the top ten Google SERPs had even one of the words on the page. YOWSA! Today's theory - when it doesn't matter, anybody can get #1 in a second, if they know the on-page rules. BUT, after a certain "commercial competitive level", the "semantic analysis" algo kicks in, and less becomes more. The keyword density rules are flipped upon their noggins. I think that we are witnessing the evolution of search engine anti-seo sophistication, right before our very eyes. Fun stuff.

Monday, June 23, 2008

Monday, June 16, 2008

What is "XML sitemap"?

What is "Change frequency"?

What is "Last Modified"?

What is "Priority"?

Here are 4 simple steps to get it done

Starting URL

Change frequency

Last modification

Priority

Check your settings and click button below

What is RSS?

Why RSS? Benefits and Reasons for using RSS

Friday, June 13, 2008

Enter The Content Of The robots.txt To Check:

Enter The URL Of The robots.txt To Check:

Test The Syntax Of Your Robots File

It Can Avoid Wastage of Server Resources

It Can Save Your Bandwidth

It Removes Clutter from your Web Statistics

Refusing a Robot

How to Set Up a Robots.txt File

Where Do You Get the Name of the Robots?

Common Mistakes in Robots.txt

It's Not Guaranteed to Work

Don't List Your Secret Directories

Only One Directory/File per Disallow line

It's Worth It

thesitewizard™ News Feed (RSS Site Feed)

Do Not Reprint Without Permission

Related Pages

New Pages

Popular Pages

How to Link to This Page

Definition of the above robots.txt file:

Summarizing the Robots Exclusion Protocol - robots.txt file

A New Concept in Marketing - The Bot Blog

Thursday, June 12, 2008

Partner site

Google Analytics

BOOKMARK

LIVE TRAFFIC MAP

About Me

Blog Archive

Sarkar Raj

abhishek bachchan | bollywood | aishwarya

LIVE TRAFFIC FEED

stat counter

hit counter