Our suite of webmaster tools provides you with a free and easy way to make your site more Google-friendly. They can show you Google’s view of your site, help you diagnose problems, and let you share info with us to help improve your site’s visibility.
Getting Google’s view of your site, and diagnosing potential problems The first step to increasing your site’s visibility on Google is learning how our robots crawl and index your site.
Crawl info: You can make sure we have access to your site, and see when Googlebot last visited. You can also view URLs that we’ve had trouble crawling and why we couldn't crawl them. This way, you can fix any problems preventing us from indexing all of your pages.
Robots.txt file validation: See if we’re having trouble with your file, and test out changes to that file before you change it on your server.
Website content: View top content from your site and see the words that other sites use to link to it.
Seeing how your site performs A second step is learning what drives traffic to your site.
Top queries: Find the top queries that drive traffic to your site and where your site is included in the top search results. This will let you learn how users are finding your site.
Indexing information: See how your site is indexed and which of your pages are included in the index. If we find violations in your site, we’ll give you the opportunity to fix the problems and request reinclusion of your site.
Sharing info with Google about your site Since no one knows more about your site than you do, you can also share this info with Google and improve your crawlability.
Submit a Sitemap file: Tell us all about your pages by submitting a Sitemap file; help us learn which pages are most important to you and how often those pages change.
Specify your preferred domain: Tell us which URL to use when indexing your site; we’ll do our best to index the version you prefer.
In general, there are two types of sitemaps. The first type of sitemap is a HTML page listing the pages of your site - often by section - and is meant to help users find the information they need.
XML Sitemaps - usually called Sitemaps, with a capital S - are a way for you to give Google information about your site. This is the type of Sitemap we'll be discussing in this article.
In its simplest terms, a Sitemap is a list of the pages on your website. Creating and submitting a Sitemap helps make sure that Google knows about all the pages on your site, including URLs that may not be discoverable by Google's normal crawling process.
Sitemaps are particularly helpful if:
Your site has dynamic content.
Your site has pages that aren't easily discovered by Googlebot during the crawl process - for example, pages featuring rich AJAX or Flash.
Your site is new and has few links to it. (Googlebot crawls the web by following links from one page to another, so if your site isn't well linked, it may be hard for us to discover it.)
Your site has a large archive of content pages that are not well linked to each other, or are not linked at all.
You can also use a Sitemap to provide Google with additional information about your pages, including:
How often the pages on your site change. For example, you might update your product page daily, but update your About Me page only once every few months.
The date each page was last modified.
The relative importance of pages on your site. For example, your home page might have a relative importance of 1.0, category pages have an importance of 0.8, and individual blog entries or product pages have an importance of 0.5. This priority only indicates the importance of a particular URL relative to other URLs on your site, and doesn't impact the ranking of your pages in search results.
Sitemaps provide additional information about your site to Google, complementing our normal methods of crawling the web. We expect they will help us crawl more of your site and in a more timely fashion, but we can't guarantee that URLs from your Sitemap will be added to the Google index. Sites are never penalized for submitting Sitemaps.
Google adheres to Sitemap Protocol 0.9 as defined by sitemaps.org. The Sitemap Protocol is a dialect of XML for summarizing Sitemap information that is relevant to web crawlers. Sitemaps created for Google using Sitemap Protocol 0.9 are therefore compatible with other search engines that adopt the standards of sitemaps.org.
While a standard Sitemap works for most sites, you can also create and submit specialized Sitemaps for certain types of content. These Sitemap formats are specific to Google and are not used by other search engines. They're a good way to give Google detailed information about specific content types. For example, publishers can use News Sitemaps to give Google information that can appear in Google News search results, such as publication date, keywords, and stock ticker symbol. Sitemap formats include:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
By placing a formatted xml file with site map on your webserver, you enable Search Engine crawlers (like Google) to find out what pages are present and which have recently changed, and to crawl your site accordingly.
What is "Change frequency"?
This value indicates how frequently the content at a particular URL is likely to change.
What is "Last Modified"?
The time the URL was last modified. This information allows crawlers to avoid recrawling documents that haven't changed. You can let the generator set this field from your server's response headers or to specify your own date and time.
What is "Priority"?
The priority of a particular URL relative to other pages on the same site. The value for this tag is a number between 0.0 and 1.0, where 0.0 identifies the lowest priority page(s) on your site and 1.0 identifies the highest priority page(s) on your site. The default priority of a page is 0.5.
1. Enter your full website URL and some optional parameters in the form below.
2. Press 'Start' button and wait until the site is completely crawled (the progress will be indicated)
3. You will see the generated sitemap details page, including number of pages, broken links list, XML file content and link to a compressed sitemap. Download the sitemap file using this link and put it into the "public_html/" folder of your site.
Please enter the full http address for your site, only the links within the starting directory will be included. For instance, "domain.com" and "www.domain.com" are not the same.
Need a standard feed icon or a feed icon PSD so you can manipulate how the icon looks?
Mozilla Foundation has made RSS feed icons available to help feed links become more recognizable and standardized. Their guidelines and FAQ provide full information on the icon and how it should be used.
Though the icons are available on the Mozilla site via the links above, the best place to get the icons is feedicons.com. At feedicons.com there is a zip file with the icons in a number of different file formats, including Photoshop and Illustrator formats.
Have a blog or other site that outputs an RSS feed? Want more exposure for your site or feed? Masternewmedia has compiled and regularly updates a list of websites where RSS feeds can be submitted. Each link has been tested and there is information on the website and how it works, as well as a direct link to the feed submission page.
We have always wanted to keep this resource brief and to the point, but we realise there is alot more that can be communicated about using RSS. Our RSS Blog was launched 26 July 2007 to extend and complement the information provided here. If you are interested in learning more about RSS go there now and subscribe! It will be updated over time with information on using RSS and will feature tools to help you use RSS in new and better ways.
Feed Reader or News Aggregator software allow you to grab the RSS feeds from various sites and display them for you to read and use.
A variety of RSS Readers are available for different platforms. Some popular feed readers include Amphetadesk (Windows, Linux, Mac), FeedReader (Windows), and NewsGator (Windows - integrates with Outlook). There are also a number of web-based feed readers available. My Yahoo, Bloglines, and Google Reader are popular web-based feed readers.
Once you have your Feed Reader, it is a matter of finding sites that syndicate content and adding their RSS feed to the list of feeds your Feed Reader checks. Many sites display a small icon with the acronyms RSS, XML, or RDF to let you know a feed is available.
RSS (Rich Site Summary) is a format for delivering regularly changing web content. Many news-related sites, weblogs and other online publishers syndicate their content as an RSS Feed to whoever wants it.
Why RSS? Benefits and Reasons for using RSS
RSS solves a problem for people who regularly use the web. It allows you to easily stay informed by retrieving the latest content from the sites you are interested in. You save time by not needing to visit each site individually. You ensure your privacy, by not needing to join each site's email newsletter. The number of sites offering RSS feeds is growing rapidly and includes big names like Yahoo News.
Robots.txt files (often erroneously called robot.txt, in singular) are created by webmasters to mark (disallow) files and directories of a web site that search engine spiders (and other types of robots) should not access.
This robots.txt checker is a "validator" that analyzes the syntax of a robots.txt file to see if its format is valid as established by Robot Exclusion Standard (please read the documentation and the tutorial to learn the basics) or if it contains errors.
Simple usage: How to check your robots.txt file format? Just insert the full URL (Example: http://www.yourdomain.com/robots.txt) of the robots.txt file you want to analyze and hit Enter
Powerful: The checker finds syntax errors, "logic" errors, mistyped words and it gives you useful optimization tips
Accurate: The validation process takes in account both Robots Exclusion Standard rules and spider-specific (Google, Inktomi, etc.) extensions (including the new "Sitemap" command).
This robots.txt analyzer is provided by Motoricerca, a non-profit italian guide to web site optimization and search engine positioning.
Robots.txt full URL (Example: http://www.domain.com/robots.txt):
This tester allows you to check your robots.txt file for syntax errors. There are 2 methods for putting your robots.txt contents into this script:
Method 1: If you already have a robots.txt file on your server, enter the URL of that file and this script will retrieve the content and test it.
Method 2: You can paste the contents of you robots.txt file into the text box below, this is an ideal way to test modifications before putting them on your server.
This is a useful file that keeps search engines from indexing pages you do not want spidered. Why would you not want a page indexed by a search engine? Perhaps you want to display a page that shows an example of spamming the search engines. This type of page might include an example of repeated keywords, hidden tags with keywords, and other things that could get a page or an entire site banned from a search engine.
The robots.txt file is a good way to prevent this page from getting indexed. However, not every site can use it. The only robots.txt file that the spiders will read is the one at the top html directory of your server. This means you can only use it if you run your own domain. The spiders will look for the file in a location similar to these below:
Now, if you have your own domain- you can see where to place the file. So let's take a look at exactly what needs to go into the robots.txt file to make the spider see what you want done.
If you want to exclude all the search engine spiders from your entire domain, you would write just the following into the robots.txt file:
User-agent: * Disallow: /
If you want to exclude all the spiders from a certain directory within your site, you would write the following:
User-agent: * Disallow: /aboutme/
If you want to do this for multiple directories, you add on more Disallow lines:
If you are curious, here is what I used to keep an article from getting indexed:
User-agent: * Disallow: /zine/article002.htm
If you want to keep a specific search engine spider from indexing your site, do this:
User-agent: Robot_Name Disallow: /
You'll need to know the name of the search engine spider or robot, and place it where Robot_Name is above. You can find these names from the web sites of the various search engines.
So, if you need to exclude something from search engine indexing, this is the most effective tool recognized by the search engines- so use it to keep the spiders out of any part of your web you want them to avoid.
At the date of this writing, as far as I know, many of the search engine spiders do not bother to index the scripts on your site (such as your CGI or PHP scripts). However, there are those that do, including one of the major players, Google.
For robots or spiders that actually index scripts, they will actually call your scripts just as a browser would, complete with all the special characters. If your site is like mine, where the scripts are solely meant for the use of humans and serve no practical use for a search engine (why should a search engine need to invoke my site-navigation script? - it can just crawl the direct links), you may want to block spiders from the directories that contain your scripts. For example, I block spiders from my CGI-BIN directory. Hopefully, this will reduce the load on the web server that occurs when scripts are executed by removing unnecessary executions.
Of course there are the occasional ill-behaved robots that hit your server at high speed. Such spiders can actually bring down your server or at the very least slow it down for the real users who are trying to access it. If you know of any such spiders, you might want to exclude them too. You can do this with a robots.txt file. Unfortunately though, ill-behaved spiders often ignore robots.txt files as well.
It Can Save Your Bandwidth
If you look at your website's web logs, you will undoubtedly find many requests for the robots.txt file by various search engine spiders. If, like me, you have a customized 404 document (which loads each time a visitor tries to retrieve a page that does not exist on your site), you will find that the robot will wind up requesting for that document instead, if you don't have an existing robots.txt file. My site has a fairly large 404 document, with the result that the spiders wind up loading it repeatedly throughout the day, adding to my already large bandwidth problems. In such a case, having a small robots.txt file may save you some bandwidth (yeah, I know, it's not that much).
Some spiders may also request for files which you feel they should not. For example, one search engine requests for graphic files (".gif" files") on my sites. Since I see little reason why I should let it index the graphics on my site, waste my bandwidth, and possibly infringe my copyright, I ban it (and in fact all spiders) from my graphic files directory in my robots.txt file.
It Removes Clutter from your Web Statistics
I don't know about you, but one of the things I check from my web statistics is the list of URLs that visitors tried to access, but met with a 404 File Not Found Error. Often this tells me if I made a spelling error in one of the internal links on one of my sites (yes, I know - I should have checked all links in the first place, but mistakes do happen).
If you don't have a robots.txt file, you can be sure that /robots.txt is going to feature in your web statistics 404 report, adding clutter and perhaps unnecessarily distracting your attention from the real bad URLs that need your attention.
Refusing a Robot
Sometimes you don't want a particular spider to index your site for some reason or other. Perhaps the robot is ill-behaved and spiders your site at such a high speed that it takes down your entire server. Or perhaps you prefer that you don't want the images on your site indexed in an image search engine. With a robots.txt file, you can exclude certain spiders from indexing your site with a robots.txt directive, provided the spider obeys the rules in that file.
How to Set Up a Robots.txt File
Writing a robots.txt file could not be easier. It's just an ASCII text file that you place at the root of your domain. For example, if your domain is www.yourdomain.com, you will place the file at www.yourdomain.com/robots.txt.
The file basically lists the names of spiders on one line, followed by the list of directories or files it is not allowed to access on subsequent lines, with each directory or file on a separate line. It is possible to use the wildcard character "*" instead of naming specific spiders. When you do so, all spiders are assumed to be named. Note that the robots.txt file is a robots exclusion file (with emphasis on the "exclusion") - there is no way to tell spiders to include any file or directory.
Take the following robots.txt file for example:
User-agent: * Disallow: /cgi-bin/
The above two lines, when inserted into a robots.txt file, inform all robots (since the wildcard asterisk "*" character was used) that they are not allowed to access anything in the cgi-bin directory and its descendents. That is, they are not allowed to access cgi-bin/whatever.cgi or even a file or script in a subdirectory of cgi-bin, such as /cgi-bin/anything/whichever.cgi.
If you have a particular robot in mind, such as the Google image search robot, which collects images on your site for the Google Image search engine, you may include lines like the following:
User-agent: Googlebot-Image Disallow: /
This means that the Google image search robot, "Googlebot-Image", should not try to access any file in the root directory "/" and all its subdirectories. This effectively means that it is banned from the entire of your website.
You can have multiple Disallow lines for each user agent (ie, for each spider). Here is an example of a longer robots.txt file:
The first block of text disallows all spiders from the images directory and the cgi-bin directory. The second block of code disallows the psbot spider from every directory.
It is possible to exclude a spider from indexing a particular file. For example, if you don't want Google's image search robot to index a particular picture, say, mymugshot.jpg, you can add the following:
Remember to add the trailing slash ("/") if you are indicating a directory. If you simply add
User-agent: * Disallow: /privatedata
the robots will be disallowed from accessing privatedata.html as well as privatedataandstuff.html as well as the directory tree beginning from /privatedata/ (and so on). In other words, there is an implied wildcard character following whatever you list in the Disallow line.
Where Do You Get the Name of the Robots?
If you have a particular spider in mind which you want to block, you have to find out its name. To do this, the best way is to check out the website of the search engine. Respectable engines will usually have a page somewhere that gives you details on how you can prevent their spiders from accessing certain files or directories.
Common Mistakes in Robots.txt
Here are some mistakes commonly made by those new to writing robots.txt rules.
It's Not Guaranteed to Work
As mentioned earlier, although the robots.txt format is listed in a document called "A Standard for Robots Exclusion", not all spiders and robots actually bother to heed it. Listing something in your robots.txt is no guarantee that it will be excluded. If you really need to protect something, you should use a .htaccess file to password-protect the directory (if you are running your site on an Apache server).
Don't List Your Secret Directories
Anyone can access your robots file, not just robots. For example, typing http://www.google.com/robots.txt will get you Google's own robots.txt file. I notice that some new webmasters seem to think that they can list their secret directories in their robots.txt file to prevent that directory from being accessed. Far from it. Listing a directory in a robots.txt file often attracts attention to the directory. In fact, some spiders (like certain spammers' email harvesting robots) make it a point to check the robots.txt for excluded directories to spider.
Only One Directory/File per Disallow line
Don't try to be smart and put multiple directories on your Disallow line. This will probably not work the way you think, since the Robots Exclusion Standard only provides for one directory per Disallow statement.
It's Worth It
Even if you want all your directories to be accessed by spiders, a simple robots file with the following may be useful:
User-agent: * Disallow:
With no file or directory listed in the Disallow line, you're implying that every directory on your site may be accessed. At the very least, this file will save you a few bytes of bandwidth each time a spider visits your site (or more if your 404 file is large); and it will also remove Robots.txt from your web statistics bad referral links report.
Copyright 2001-2008 by Christopher Heng. All rights reserved. Get more free tips and articles like this, on web design, promotion, revenue and scripting, from http://www.thesitewizard.com/
If you find this article useful, please consider making a donation.
When I first started writing my first website, I did not really think that I would ever have any reason why I would want to create a robots.txt file. After all, did I not want search engine robots to spider and thus index every document in my site? Yet today, all my sites, including thesitewizard.com, have a robots.txt file in their root directory. This article explains why you might also want to include a Robots.txt file on your sites, how you can do so, and notes some common mistakes made by new webmasters with regards the ROBOTS.TXT file.
For those new to the robots.txt file, it is merely a text file implementing what is known as the Standard for Robot Exclusion. The file is placed in the main directory of a website that advises spiders and other robots which directories or files they should not access. The file is purely advisory - not all spiders bother to read it let alone heed it. However, most, if not all, the spiders sent by the major search engines to index your site will read it and take cognizance of the rules contained within the file.
The robots.txt file is a set of instructions for visiting robots (spiders) that index the content of your web site pages. For those spiders that obey the file, it provides a map for what they can, and cannot index. The file must reside in the root directory of your web. The URL path (web address) of your robots.txt file should look like this...
/robots.txt
The Robots text file open in Notepad might look like this:
This is a screen shot of an empty robots.txt file
Definition of the above robots.txt file:
User-agent: * The asterisk (*) or wildcard represents a special value and means any robot.
Disallow: The Disallow: line without a / (forward slash) tells the robots that they can index the entire site.
Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record without the / (forward slash) as shown above.
The presence of an empty "/robots.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all robots will consider themselves welcome.
The Disallow: line without the trailing slash (/) tells all robots to index everything. If you have a line that looks like this:
Disallow: /private/
It tells the robot that it cannot index the contents of that /private/ directory.
Summarizing the Robots Exclusion Protocol - robots.txt file
User-agent: Named Bot Disallow: /private/ Disallow: /images-saved/ Disallow: /images-working/
Note: The asterisk (*) or wildcard in the User-agent field is a special value meaning "any Robot" and therefore is the only one needed until you fully understand how to set up different User-agents.
If you want to Disallow: a particular file within the directory, your Disallow: line might look like this one:
Disallow: /private/top-secret-stuff.htm
Keep in mind that using the above example excludes that specified page (top-secret-stuff.htm) but will not exclude the entire /private/ directory.
You should validate your robots.txt file. Enter the full URI to the robots.txt file on your server. The robots.txt file always resides at the root level of your web.
Here are a few good online references for information on the Robots Exclusion Protocol.
Brett Tabke - WebmasterWorld - The Bot Blog Robots.txt - Where no blog has gone before. Brett may have started a new marketing medium, blogging via your robots.txt file. Only a true geek would appreciate The Bot Blog!
As Search Engine Optimisation experts, Rupiz Media deals in perfecting on-page factors that influence the rankings of your web pages. Catering to a range of services in this regard, we provide viable solutions to web businesses of all kinds.
As part of on-page optimisation, we take care of the following aspects of a website:
Keyword Research We undertake extensive research of the web to enlist the most important keywords and phrases for your website. These keywords eventually form the base of your search engine marketing campaign.
Content Review & Optimisation We review the content of your site and suggest changes wherever need be. These changes include judicious sprinkling of your web pages with essential keywords, so as to foster the process of search engine indexing. Along with that, we also ensure that the content is written in simple, marketing-oriented language that can attract visitors.
Competitor Analysis Rupiz Media studies the online marketplace for you. We scan the websites of your competitors in order to analyse the reasons for their success or failure in various optimisation techniques. This helps us formulate strategies that are advantageous for your business.
Meta Tags Optimisation A part of your site's HTML programming, meta tags are distinctive to each and every web page. We optimise the Title, Description, Keyword, and other meta tags in a way that they comprise your main keywords and assist in faster indexing of your pages.
URL Optimisation We optimise the URLs of your web pages in order to make them more conducive for indexing. This process also involves the use of pertinent keywords that can attract search engine spiders.
Image Optimisation In case your website has images, we optimise their Alt Tags and package them in a way that they do not impede the process of optimisation.
Site Navigation When it comes to structuring and internal linking of your website, we give due consideration to search engine friendliness to render an eye-catching navigational constitution to your pages.
On-Page factors are related directly to the content and structure of the website. This normally consists of pages written in the HyperText Markup Language but also applies to other document formats that are indexed by search engines, for example Microsoft Word or PDF formats. On-page optimization involves modifying keyword frequency in the URL, Title, Headings, Hypertext Links and Body text. It may also involve reducing redundant HTML codes (aka cruft) produced by Web page authoring tools and restructuring the site to produce better linked and focussed page content.
Many search engines now discount the weight given to on-page factors because they give too much scope for abuse by SEO experts. In theory the visible parts of a web-page are less prone to manipulation as they have to make sense to readers. However doorway pages with redirections and clever use of style sheets enable different content to be served to search engines and end users.
Each page should target between two and four keywords directly related to the contents. If you feel the need for more keywords then consider splitting your content into separate pages. The Uniform Resource Locator (URL) should contain keywords, separated by hyphens without being too long, around 128 characters is probably a sensible upper limit for the entire URL. The Title tag should contain the keywords with no stop words but arranged to make sense.
On Page Optimization
This should be the first tag in the Head section of the page. There is evidence that search engines give more weight to factors higher up the page. The content should be properly structured with the use of Heading (H1, H2, H3 etc) tags containing relevant keywords. Search-engines will only index a limited amount of text in HTML tags and using too many keywords will dilute the focus. Don't spam any of these tags, this won't be effective and could result in a penalty.
Many website designers spend a lot of time creating Keyword and Description meta tags. Although these may be read by search engines, for example the description tag is used by Yahoo! to provide a short description of the site in the Search Engine Results Pages, they are not used for ranking pages.
Personally I don't bother with them as they bulk out pages for little real benefit. Both Google, Yahoo! and MSN Search will use the text they find on the page as a description so make sure your first header and sentence describe the contents. However some search engine watchers say that the new Microsoft search engine, currently in beta tests, puts some weight on meta-tags. There is also evidence to suggest that search engines give more prominence earlier in the page and some engines will only index a limited amount of body text so making the first paragraph punchy is a good idea.
Image alternate-text tags (ALT tags) are only indexed where the image is part of a hyperlink. However ALT tags are useful for non-graphical browsing and should be employed correctly.
Comments are not indexed. Use bold/strong/italic attributes where appropriate.
Write natural copy aimed at the end user and not search engines. Don't worry too much about keyword density for the contents but take the opportunity to include keywords combined in different phrases and orders and create anchor text to related internal pages. Keep the number of links to fewer than 50, and probably less and don't repeat identical outbound-links. Theme related pages should be at the same level in the site hierarchy and be linked through the site's menu structure and site map. At least one page at the same level should link back to the home page so that search engines that have traversed a deep-link can index the rest of the website.
For any other document format, e.g. PowerPoint, Adobe PDF etc make sure you at least have a descriptive document title. Try to avoid formats that search engines find hard to understand, even where a search engine can index a format it will carry less information than plain old HTML. Avoid using images to replace text, except occasionally in hyperlinks. Avoid formats such as flash, shockwave and sitemaps where there is no alternative text. Avoid HTML Frames which some search engines find hard to navigate, use Style Sheets (CSS) instead. Style Sheets should also be used to reduce the amount of formatting within documents. Keep pages to less than 100 kilobytes and preferably not much more than a screen full of text. Where Javascript or Flash menus are used include plain-text links at the bottom of the page. These will ensure all search engines index the rest of your website.
Other factors directly under the control of the website is the amount of content. Large websites generally rank better than small websites for a number of reason. Search engines also like fresh content and will spider this more frequently. A regularly updated news page, even a blog, can provide deep links to the rest of the website.
1. AllegedPOSITIVE ON-Page SEO Google Ranking Factors (38) (Keeping in mind the converse, of course, that when violated, some of these factors immediately jump into the NEGATIVE On-Page Ranking Factors domain.) The term "Keyword" below refers to the "Keyword Phrase", which can be one word or more. Green rows confirmed byGoogle patent- updated 08-10-06
Keyword in Title tag - close to beginning Title tag 10 - 60 characters, no special characters.
-
4
Keyword in Description meta tag
Shows theme - less than 200 chars. Google no longer "relies" upon this tag, but will often use it.
-
5
Keyword in Keyword metatag
Shows theme - less than 10 words. Every word in this tag MUST appear somewhere in the body text. If not, it can be penalized for irrelevance. No single word should appear more than twice. If not, it may be considered spam. Google purportedly no longer uses this tag, but others do.
Keywords - Body
-
6
Keyword density in body text
5 - 20% - (all keywords/ total words) Some report topic sensitivity - the keyword spamming threshold % varies with the topic.
-
7
Individual keyword density
1 - 6% - (each keyword/ total words)
HOT
8
Keyword in H1, H2 and H3
Use Hx font style tags appropriately
-
9
Keyword font size
"Strong is treated the same as bold, italic is treated the same as emphasis" . . . Matt Cutts July 2006
-
10
Keyword proximity (for 2+ keywords)
Directly adjacent is best
-
11
Keyword phrase order
Does word order in the page match word order in the query? Try to anticipate query, and match word order.
-
12
Keyword prominence (how early in page/tag)
Can be important at top of page, in bold, in large font
Keywords - Other
-
13
Keyword in alt text
Should describe graphic - Do NOT fill with spam (Was part of Google Florida OOP - tripped a threshold - may still be in effect to some degree as a red flag, when summed with all other on-page optimization - total page optimization score - TPOS).
-
14
Keyword in links to site pages (anchor text)
Links out anchor text use keyword?
NAVIGATION - INTERNAL LINKS
SITE
15
To internal pages- keywords?
Link should contain keywords. The filename "linked to" should contain the keywords. Use hyphenated filenames, but not long ones - two or three hyphens only.
SITE
16
All Internal links valid?
Validate all links to all pages on site. Use a free link checker. I like this one.
SITE
17
Efficient - tree-like structure
TRY FOR two clicks to any page - no page deeper than 4 clicks
SITE
18
Intra-site linking
Appropriate links between lower-level pages
54
NAVIGATION - OUTGOING LINKS
55
19
To external pages- keywords?
Google patent - Link only to good sites. Do not link to link farms. CAREFUL - Links can and do go bad, resulting in site demotion. Unfortunately, you must devote the time necessary to police your outgoing links - they are your responsibility.
56
20
Outgoing link Anchor Text
Google patent - Should be on topic, descriptive
61, 62
21
Link stability over time
Google patent - Avoid "Link Churn"
-
22
All External links valid?
Validate all links periodically.
-
23
Less than 100 links out total
Google says limit to 100, but readily accepts 2-3 times that number. ref 2k
OTHER ON-Page Factors
-
24
Domain Name Extension Top Level Domain - TLD
.gov sites seem to be the highest status .edu sites seem to be given a high status .org sites seem to be given a high status .com sites excel in encompassing all the spam/ crud sites, resulting in the need for the highest scrutiny/ action by Google. Perhaps one would do well with the new .info domain class. - Nope. Spammers jumped all over it - no safe haven there. Not so much, now - .info sites can rank highly.
-
25
File Size
Try not to exceed 100K page size (however, some subject matter, such as this page, requires larger file sizes). Smaller files are preferred <40k>
26
Hyphens in URL
Preferred method for indicating a space, where there can be no actual space One or two= excellent for separating keywords (i.e., pet-smart, pets-mart) Four or more= BAD, starts to look spammy Ten = Spammer for sure, demotion probable?
6, 7 12, 13
27
Freshness of Pages
Google patent - Changes over time Newer the better - if news, retail or auction! Google likes fresh pages. So do I.
8, 9
28
Freshness - Amount of Content Change
New pages - Ratio of old pages to new pages
27
29
Freshness of Links
Google patent - May be good or bad Excellent for high-trust sites May not be so good for newer, low-trust sites
Keep it minimized - use somewhat less than the 2,000 characters allowed by IE - less than 100 is good, less is even better
OTHER ON-SITE Factors
5
36
Site Size - Google likes big sites
Larger sites are presumed to be better funded, better organized, better constructed, and therefore better sites. Google likes LARGE sites, for various reasons, not all positive. This has resulted in the advent of machine-generated 10,000-page spam sites - size for the sake of size. Google has caught on and dumped millions of pages, or made them supplemental.
4
37
Site Age
Google patent - Old is best. Old is Golden.
3
38
Age of page vs. age of site
Age of page vs. age of other pages on site Newer pages on an older site will get faster recognition.
Note: For ALL the POSITIVE On-Page factors listed above, PAGE RANK can OVERRIDE them all. So can Google-Bombing.
2. Alleged Negative ON-Page SEO Google Ranking Factors (24)
Note
Factor #
NEGATIVE ON-Page SEO Factors
Brief Note
BAD
39
Text presented in graphics form only No ACTUAL body text on the page
Text represented graphically is invisible to search engines.
BAD
40
Affiliate site?
The Florida update went after affiliates with a vengeance - flower and travel affiliates were hit hard - cookie-cutter sites with massive inter-linking, but little unique content. Subsequent updates have also targeted affiliates.
Penalty for over-compliance with well-established, accepted web optimization practices. Too high keyword repetition (keyword stuffing) may get you the OOP. Overuse of H1 tags has been mentioned. Meta-tag stuffing.
BAD
42
Link to a bad neighborhood
Don't link to link farms, FFAs (Free For All's) Also, don't forget to check the Google status of EVERYONE you link to periodically. A site may go "bad", and you can end up being penalized, even though you did nothing. For instance, some failed real estate sites have been switched to p0rn by unscrupulous webmasters, for the traffic. This is not good for you, if you are linking to the originally legitimate URL.
BAD
43
Redirect thru refresh metatags
Don't immediately send your visitor to another page other than the one he/ she clicked on, using meta refresh.
BAD
44
Vile language - ethnic slur
Including the George Carlin 7 bad words you can't say on TV, plus the 150 or so that followed. Don't shoot yourself right straight in the foot. Also, avoid combinations of normal words, which when used together, become something else entirely - such as the word juice, and the word l0ve. See why I wrote that zero? I don't even want to get a proximity penalty, either. Paranoia, or caution? You decide. I always want to try to put my "best foot forward".
The word "Links" in a title tag has been suggested to be a bad idea. Here is my list of Poison Words for Adsense. This penalty has been loosened - many of these words now appear in normal context, with no problems. But watch your step.
- within the same C block (IP=xxx.xxx.CCC.xxx) If you have many sites (>10, author's guess) with the same web host, prolific cross-linking can indicate more of a single entity, and less of democratic web voting. Easy to spot, easy to penalize. "This does not apply to a small number of sites" .. (this author guesses the number 10, JAWG) . . . "hosted on a local server". . Matt Cutts July 2006
BAD
47
Stealing images/ text blocks from another domain
Copyright violation - Google responds strongly if you are reported. ref egol File Google DMCA
Targeting too many unrelated keywords on a page, which would detract from theming, and reduce the importance of your REALLY important keywords.
??
50
Page edit - can reduce consistency
Google patent - Google is now switching between a "newer" cache, and several "older" caches, frequently drawing from BOTH at the same time. This was possibly implemented to frustrate SERP manipulators. Did your last edit substantially alter your keywords, or theme? Expect noticeable SERP bouncing.
6 - 7
51
Frequency of Content Change
Google patent - Too frequent = bad
32, 33
52
Freshness of Anchor Text
Google patent - Too frequent = bad
??
53
Dynamic Pages
Problematic - know pitfalls - shorten URLs, reduce variables (". . no more than 2 or 3", M.Cutts July 2006), lose the session IDs
OK - No penalty - Google advises against this. All over the place - but nothing is ever done. (The text is the same color as the background, and hence cannot be seen by the viewer, but can be visible to the search engine spiders.) I believe Google does penalize for hidden text, since it is an attempt to manipulate rank. Although they don't catch everyone.
-
60
Gateway, doorway page (I see changes here - not only does the doorway page disappear, but the main page gets pushed down, as well - this is a welcome fix.)
OK - No penalty - Google advises against this. Google used to reward these pages. Multiple entrance pages in the top ten SERPs - I see it daily. There they are at #2, with their twin at #5 - 6 months now. Reported numerous times.
OK - No penalty - Google advises against this. Google picks one (usually the oldest), and shoves it to the top, and pushes the second choice down. This has been a big issue with stolen content - the thief usurps your former position with YOUR OWN content.
-
62
HTML code violations (The big G does not even use DOCTYPE declarations, required for W3C validation.)
Doesn't matter - Google advises against this. Unless of course, the page is totally FUBAR. Simple HTML verification is NOT required (but advised, since it could contribute to your page quality factor - PQF).
-
Since the above 4 items are so controversial, I would like to add this comment: There are many things that Google would LIKE to have webmasters do, but that they simply cannot control, due to logistical considerations. Their only alternative is to foment fear and doubt by implying that any violation of their "suggestions" will result in swift and fierce demotion. (This is somewhat dated - G is fixing these things.)
IN GENERAL, this works pretty well to keep webmasters in line. The fallacy of this is that attentive webmasters can readily observe continuing, blatant exceptions to these official pronouncements.
There are many anecdotes about Goggle "taking care" of a problem. Google states that they do not provide hand-tweaked "boosts", but are silent about hand-tweaked demotions. They occur, for sure. To believe otherwise is naive. Wouldn't YOU swat the most obnoxious flies? I would.
It is becoming easier to determine the best thing to do. Try to avoid any Google penalties or demotions.
Feb. 2007 - Google patent granted. Do not use phrases that have been associated and correlated with known spamming techniques, or you will be penalized. What phrases? Ahh, you tell me.
3. Alleged POSITIVE OFF-Page SEO Google Ranking Factors (43)
Note
Factor #
POSITIVE OFF-Page SEO Factors
Brief Note
INCOMING LINKS :
HOT
63
Page Rank
Based on the Number and Quality of links to you Google link reporting continues to display just a SMALL fraction of your actual backlinks, and they are NOT just greater than PR4 - they are mixed.
-
64
Total incoming links ("backlinks")
Historically, FAST counted best (www.alltheweb.com). No more - Yahoo (parent) broke it.
Current TYPICAL Backlink Reporting Ratios - Google - 30 links MSN - 1,000 links Yahoo - 3,000 links
-
65
Incoming links from high-ranking pages
In 2004, Google used to count (report) the links from all PR4+ pages that linked to you. In 2005-2006, Google reported only a small fraction of the links, in what seemed like an almost random manner. In Feb. 2007, Google markedly upgraded (increased) the number of links that they report.
-
66
Acceleration of link popularity (". . . used to be a good thing" ... Martha)
Google patent Link acquisition speed boost - speculative Too fast = artificial? Cause of -30 penalty? Sandbox penalty imposed if new site?
FOR EACH INCOMING LINK :
-
67
Page rank of the referring page
Based on the quality of links to you
HOT
68
Anchor text of inbound link to you
Contains keyword, key phrase? #1 result in SERP does NOT EVEN need to have the keyword(s) on the page, ANYWHERE!!! What does that tell you? (Enables Google-bombing - search for "miserable failure")
69
Age of link
Google patent - Old = Good.
70
Frequency of change of anchor text
Google patent - Not good. Why would you do that?
71
Popularity of referring page
Popularity = desirability, respect
-
72
# of outgoing links on referrer page
Fewer is better - makes yours more important
-
73
Position of link on referrer page
Early in HTML is best
-
74
Keyword density on referring page
For search keyword(s)
-
75
HTML title of referrer page
Same subject/ theme?
28
76
Link from "Expert" site?
Google patent - Big time boost (Hilltop Algorithm) Recently reported to give a big boost !
The Google Directory is produced by an unknown, ungoverned, unpoliced, ill-intentioned, retaliatory, monopoly enterprise, consisting of profiteering power-ego editors feathering their own nests - the ODP. AOL is making millions, and needs to police it's run-amok entity. Enough already!
This is a tough one. Google's directory comes STRAIGHT from the DMOZ directory. You should try to get into dmoz. But you can't. Be careful whom you approach with the old spondulix - Formal DMOZ Bribe Instructions. It is almost impossible to get into DMOZ. This site cannot get in, after waiting over 2 YEARS (33 months). Not even in the lowest, most insignificant category, "Personal Pages".I guess I just don't "measure up" to the other 20,000+ sites in the personal category. I'm not the suck-up type - I kissed them off long ago. What a waste of time!
UPDATE: This page (not site) finally got indexed in June 2007, thanks to a legitimate editor. No money was paid. Google needs to DO SOMETHING about populating its own directory with the skewed, incomplete, poorly determined results from the dysfunctional Open Directory Project - the ODP! Absolute Power Corrupts Absolutely
-
82
DMOZ category?
Theme fit category? General or geographic category? Both are possible, and acceptable.
Previously, many pages preferred - conferred authority upon site, thus page. Bigger sites = better SERPs Now, fewer pages preferred, due to proliferation of computer-generated pages. Google has been dropping pages like crazy.
4. Alleged NEGATIVE OFF-Page SEO Google Ranking Factors (13)
Note
Factor #
NEGATIVE OFF-Page SEO Factors
Brief Note
-
120 (added)
Traffic Buying
Have you paid a company for web traffic? It is probably low quality traffic, with a zero conversion rate. Some providers of traffic for traffic's sake may be considered "bad neighborhoods". Can Google discount your traffic (for true popularity), because they know it's mostly phony? Have you read about Traffic Power?
In a nut shell, old links are valued, new links are not. This is intended to thwart rapid incoming link accumulation, accomplished through the tactic of link buying. Just one of the sandbox factors.
18
107
Change of Meanings
Query meaning changes over time, due to current events
BAD
108
Zero links to you
You MUST have at least 1 (one) incoming link (back link) from some website somewhere, that Google is aware of, to REMAIN in the index.
BAD
109
Link-buying
(Very good IF you don't get caught, but don't do it - when caught, the penalty isn't worth it.)
Google patent - Google hates link-buying, because it corrupts their PR model in the worst way possible. 1. Does your page have links it really doesn't merit? 2. Did you get tons of links in a short time period? 3. Do you have links from high-PR, unrelated sites?
41, 42
110
Prior Site Ranking
Google patent - High = Good
BAD
111
Cloaking
Google promises to Ban! (Presenting one webpage to the search engine spider, and another webpage to everybody else.)
??
112
Links from bad neighborhoods, affiliates
Google says that incoming links from bad sites can't hurt you, because you can't control them. Ideally, this would be true. However, some speculate otherwise, esp., when other associated factors are thrown into the mix, such as web rings.
BAD
113
Penalties - resulting from Domain Hijacking (work with Google to fix)
Should result in IMPRISONMENT, forthwith! Grand Theft, mandatory minimum sentence. The criminal COPIES your entire website, and HOSTS it elsewhere, with . . . a few changes.
-
114
Penalty - Google TOS violation
WMG is the worst offender - gobbles up tons of Google server time by nervous Nellie webmasters. Google even mentions them by name. I think that Google will spank you when you cross the threshold, of say, 100 queries per day for the same term, from the same IP. Google can block your IP. Get a Google API.
??
115
Server Reliability - S/B >99.9%
What is your uptime? Ever notice a daily time when your server is unavailable, like about 1:30 AM? How diligent must Googlebot be? This is the worst reason to get dropped - you just aren't there! An ISP maintenance interruption can cause delisting..
-
116
No more room Pages being dropped from large sites
The 232 problem - Google has hit the 4.3 Gigabyte address space wall. Bull! Google now has over 8 Gigs of indexed pages. Thousands of pages are disappearing from various huge websites, but I think that it is G just cleaning house, by dumping computer-generated pages.
HOT
117
Rank Manipulation by Competitor Attack
(1. Content theft causing you to get a duplicate content penalty, even though your content is the original - Google has problems tracking original authorship. People are still stealing my content, but nobody trumps me (in Google) with my own content - hats off to Google.)
Impossible by Google definition (except for a few nasty tricks, like making your competition appear to be link spammers) Ideally, there SHOULD be nothing that your competition can do to directly hurt your rankings.
However, an astute observer noticed that Google changed their website to read : Old verbiage = "There is nothing a competitor can do to harm your ranking ..." New verbiage = "There is ALMOST nothing a competitor can do ..." An obvious concession that Google thinks that at least some dirty tricks work!
Of course, there will always be new ones!
-
118
Bouncing Ball Algorithm
At least 2, and often 3 identifiable Google Search Algos are currently in use, alternating pseudo-randomly through the data centers. G has moved to a daily dance. Multiple changing factors are applied daily. GOOD LUCK NOW on trying to figure things out!
IN ADDITION, some the above factors are being "tweaked" daily. Not only are the "weights" of the factors changed, but the formula itself changes. Change is the only constant.
An algo change can boost or demote your site. I put this in the negative factors section, because your position is never secure, unless of course, you are huge (PR=7 or greater). If you simply cannot achieve top position, your only alternative to first page SERP exposure may be Google Ad Words (you pay for exposure).
Today, I searched for an extremely competitive "2-word term", and I found that NOT ONE of the top ten Google SERPs had even one of the words on the page. YOWSA! Today's theory - when it doesn't matter, anybody can get #1 in a second, if they know the on-page rules. BUT, after a certain "commercial competitive level", the "semantic analysis" algo kicks in, and less becomes more. The keyword density rules are flipped upon their noggins. I think that we are witnessing the evolution of search engine anti-seo sophistication, right before our very eyes. Fun stuff.