Lesson 6

Site Architecture; Making Your Web Site Easy for Search Engines to Index

Website Architecture

Now that you know the importance of Keywords, Links, Domain Names, and Competitive Analysis, it’s time to understand how your Web site should be set up so that search engines will find it, and list it within their index. After all, the search engines can’t rank your pages at the top of the search results if they don’t know about them.  By the way, Indexing is what search engine spiders are doing when they crawl a Web site to collect data about your Web pages. They store that data in a database called an index.

Processing and storing a Web page is referred to as indexing. Therefore, you must ensure your Web pages are as easy for search engines to access and index as possible. Believe it or not, many Web sites are inadvertently or accidently configured to be difficult for search engines to access. And, …some Web sites are actually blocking search engines while their Web site owners are wondering why their site isn’t doing better in the search rankings!

Little do they know it’s because of the indexing-roadblocks they’ve inadvertently placed in the way.

So, pay close attention because here’s where you will learn:

  1. How to avoid the common mistakes that keep Web pages from being indexed, and
  2. How to ensure that all of your important pages get indexed properly to give them a good chance to rank at the top of the search results.

Remember though, making your site search engine friendly, by itself, won’t propel you to the top of the rankings. A search engine friendly site is actually more about avoiding the mistakes that will prevent you from getting indexed or damage your search engine rankings. To achieve that top rank, however, you must totally understand the critical role that keywords, inbound links, overall site strength, etc. play as you use this understanding to your advantage when designing your site.

There are always two important points to remember about search engines and how they relate to your Web site:

  1. The first point is: the quality of your site counts. Search engines make their money through advertising. Showing ads to their users is their profit model, and the more users they have, the more money they make. The way a search engine gets more users is by providing the best search results. This means that, if your site is the most useful site to customers in your keyword category, then search engines want to rank you at or near the top of the search results. Indeed, their revenue stream depends on it.
  2. The second point to remember is: search engines spiders are really just computer programs. More precisely, search engines run a program called a spider that:
  • visits your Web site,
  • reads the text and links on your Web pages,
  • then decides what to do with your Web pages based on the information it reads. They call this activity crawling your Web site. So, search engine spiders are computer programs that crawl Web pages.  And, if you’ve ever used a computer, you know that computer programs break sometimes, especially if you overtax them. You may have noticed that your own computer starts to slow down and may even crash if you have too many applications open. It’s the same with a search engine spider.

If your Web site is laid out in a confusing and disorganized fashion, or if the links between your pages are difficult for a search engine spider to find, your site is not going to be crawled as efficiently as you would like. This means that some of your pages will get missed. It also means your site won’t be crawled very often and your listings won’t be fresh. That puts you at a disadvantage when it comes to getting new pages indexed; and if your pages don’t make it into the index, they certainly can’t be ranked highly. Remember, there are billions of Web pages on the Internet. And search engines have to make the most of their available time and resources to crawl all those pages. It’s your job to make sure crawling your pages is quick and easy for the search engine spiders. Otherwise, you risk having your Web pages ignored by the search engines. Remember, this lesson is focused on making your site spider-friendly. The tactics and strategies covered here won’t rocket you to the top of the search engines (you’ll need to use incoming links and keyword strategies to do that), but they will help you avoid the mistakes that can nuke your rankings by locking you inside the starting gate. In other words, if it’s difficult for search engine spiders to crawl your site, you’ll be handicapped in the ranking race regardless of all your other good efforts.

Keep Your URLs Simple

Search engine spiders find the pages on your Web site by following links. They work in similar fashion to the way you use your browser—only much more quickly. They download a page, scan it for links, and store those links in a list. Once they’re done scanning the page, they grab the first link from the list and repeat the steps until they’ve followed all of the links one by one. Of course, this is a simplified explanation, but it essentially defines the process of how a search engine finds Web pages. Many Web sites, especially e-commerce sites, use dynamically generated URLs.

These are Web addresses that look something like: 

http://domain.com/product-id5-cust8&+8=7?gtf.

These dynamic URLs are automatically generated by pulling variables out of a database to match the product specifications a customer is looking for. Dynamically generated URLs usually contain lots of non-numerical, non-alphabetical characters, such as ?, &, +, and, =.

For example, a site that sells Hawaiian muumuus might have a page with the following dynamically generated URL:

http://yoursite.com/index.php?item=muumuu&color=blue&size=large

This is opposed to a static-looking URL, which is a bit easier on the eyes:

http://yoursite.com/muumuu/blue/large

Although most search engine spiders are capable of crawling these long and confusing, dynamically generated URLs, it is best if you can avoid using them at all. When all else is equal, a Web site with short, static-looking URLs is more likely to achieve a higher number of pages indexed by the search engines than a comparable site that produces dynamically-generated (DG) Web pages. Dynamic URLs are often the source of duplicate content which can affect your sites ranking.

Many content management systems (CMS) with dynamic URLs generate the same content reachable via multiple varied URLs. It is important to avoid getting those duplicate pages indexed if possible. You may need to make use of the canonical tag to tell the search engines which is the URL to index, and which ones to ignore. For more information on the use of the canonical tag read – Are you using the Canonical Tag Correctly? However, there are times when the advantages of DG pages outweigh the SEO drawbacks. So, IF your site absolutely must rely on dynamically pulling content from a database to create its URLs, it’s still possible to have your URLs appear static by using a tool like mod_rewrite. A mod_rewrite is a tool that can be used to rewrite a dynamic URL as a static URL on the fly. This is commonly done for SEO purposes to improve the pages navigation for spiders. When you are ready to apply this Advanced SEO tactic, be sure to study our extensive mod_rewrite

Advanced SEO Tutorial:

Getting Your Dynamic Sites Completely Indexed with mod_rewrite

By the way, if the tutorial mentioned above seems a bit too complicated for you, then share it with your web or tech people and have them handle the details of turning your dynamic URLs into search engine (and people friendly) web addresses. This is something that you definitely want to get right! And, even though it’s a little complicated, it’s worth the effort as it magically renders your complex, dynamic, ugly looking URL’s into simple links that search engine spiders and PEOPLE just love to follow. In essence, you get to eat your cake and have it too.

Meta Tags: Do They Matter?

meta-tags-seo

Meta tags are non-displayed text written into the source code of your HTML document intended to describe your page to the search engine for the purpose of cataloging the content of your page. A considerable amount of undeserved attention has been given to Meta tags and an enduring myth has evolved in the process.

  • Meta Tag Myth: Meta tags are what propel pages to the top of the search engines (wrong!).
  • Meta Tag Reality: Meta tags, while potentially useful to describe your Web page contents to a web browser or search engine, have no appreciable effect on actual search engine rankings whatsoever. None! There seems to be this semi-absurd ongoing debate as to whether or not Meta tags should be included in your HTML document. Let’s put that debate to rest once and for all. Surprisingly, the answer is a resounding YES! Here’s why.

While it is true that Meta tags will not help your rankings, it is also true that the Meta description tag should absolutely be included in every Web page document that you want described on the search engine results page. That’s because the Meta description tag is used by many search engines as the summary description for your page when your page is listed in the search results. The contents found within the Meta description tag is often the sub-headline and the sales description for your link! (Remember! …your <title> tag is the headline that is displayed in the search results.) The Meta description tag, when displayed in the search engine results, helps the searcher decide whether or not your page is relevant to their search. It’s what compels a real person to click your link. After all, that’s the reason for being listed by the search engine in the first place!

If you omit the Meta description tag, then the search engine is likely to fabricate a description for your site based on arbitrary text gleaned from somewhere on your page. Here’s an example of a terrible, yet real life search engine results description we found when searching for Hawaii scuba diving:

  • Link Title: Scuba Diving Maui Hawaii
  • Summary description: click to go home Now, we’re pretty sure that this company didn’t really want click to go home used as their page description, but that’s what they got because they failed to use a Meta description tag. Another possibility is that the search engine will omit the summary description entirely if it fails to find anything useful within your page to use as a summary. In either case, a potential site visitor is less motivated to click your link if you fail to properly utilize the Meta description tag.

Hence, in every case where you want a description for your link within the search engine results, be certain to include a relevant and enticing Meta description tag.

The following example illustrates the HTML source code located at the very beginning of a very basic Web page. Below you can see the Meta description tag, and its contents, highlighted:

<html><head>
<title>Cell Phone Accessories</title>
<meta name=”description” content=”The latest in cell phone accessories at the lowest prices for every known brand of cell phone on the planet!”>
<meta name=”keywords” content=”cell phones, Leather Cases, Cellphone holders, Antennas, antennaes, chargers, batteries, face plates, flashing batteries, hands free head phones, headphones, range extenders, bateries”> </head>

The only other Meta tag that you may hear discussed is the keyword meta tag, which is no
longer observed or used by major Search Engines:

<html> <head>
<title>Cell Phone Accessories</title>
<meta name=”description” content=”The latest in cell phone accessories at the lowest prices for every known brand of cell phone on the planet!”>
<meta name=”keywords” content=”cell phones, Leather Cases, Cellphone holders, Antennas, antennaes, chargers, batteries, face plates, flashing batteries, hands free head phones, headphones, range extenders, bateries”> </head>

We do not recommend wasting any time at all on the meta keyword tag. The only time you should be concerned about it is if you’re working on an old web site that has spam laden meta keyword tags, if you encounter this, remove them as they can still be seen as a technique used by spammers. Oh, and by the way, NONE of the other Meta tags have any effect on search engine rankings, whatsoever! …never have and probably never will—no matter what you’ve heard!

How to Customize the Way Your Listings Appear in Google

Google now provides a way for you to customize how your listings appear in the search results. Previously, you were limited to just titles and descriptions, but now it’s possible to get star ratings, product images, prices, business addresses and more included with your search results listing. For example, take a look at this cafe listing from Yelp.com

yelp

You can see star ratings, number of user reviews and average price range for a meal. Google is displaying this data using a feature they call Rich Snippets—information extracted from indexed pages that have special tags embedded into their HTML code. Those tags come in two forms, microformats and RDFa.

While this might sound complicated, these formats are about as easy to master as regular HTML. And, although developers haven’t yet settled on a standard, the fact remains that you can use either microformats or RFDa (we find microformats a little easier). Then you simply denote certain data on your pages by wrapping them in tags with descriptive class attributes. For example, to create the listing above, you would wrap portions of your page’s data in tags that describe that data as seen below.
<div>
<span>
Cafe Cakes</span>
<span>
4</span> out of 5.
<span>
28</span> reviews.
<span>
$</span>
</div>

Wrapping everything in a hreview div tag lets Google know it’s a review. Then you use name, rating, count and pricerange span tags to add the other information. So far, we’re seeing these Rich Snippet listings just for restaurants and cafes, but Google is working on rolling them out to more categories.

Google provides examples and tutorials on Rich Snippets for the following:

  • Reviews
  • People
  • Products
  • Businesses and organizations

Currently, business directories and others sites based upon reviewing and categorizing other businesses stand to gain the most from having Rich Snippets added to their pages. However, as Google expands this program it’s likely to become relevant to many other types of Web sites as well. In general, listings that are enhanced with Rich Snippets can expect to increase their click through rate—so we highly recommend them.

Be Careful with Session IDs and Dynamic URLs

Session IDs are unique identifiers, often embedded in URLs that allow a Web site to track a customer from page to page. For example, when shopping at an ecommerce site, session IDs are used to keep track of the items in your shopping cart. For search engine spiders, however, session IDs can cause a problem because they can inadvertently create a huge number of links for the spider to crawl. The danger is that the spider might repeatedly index what is essentially the same Web page over and over. They can get trapped in a loop as each newly crawled page dynamically generates even more links for the spider to follow. They call this a ‘spider trap.’

Here’s how a system that uses session IDs can give the appearance of generating an endless number of pages within a single site. For example, a link with session ID tracking that looks like…

http://www.yoursite/shop.cgi?id=dkom2354kle03i

…is served to the spider when it first downloads one of your Web pages. That page is then processed, but when the spider returns to the site to download more pages, it finds another URL that looks like:

http://www.yoursite/shop.cgi?id=hj545jkf93jf4k

It’s actually the same page—only with a different tracking session ID variable. But to the spider it looks like a brand new URL, so the spider can get trapped downloading the same page over and over again. This problem can also result in duplicate content getting indexed in the engine, which can lead to a reduction in ranking. Although Google is constantly striving to improve their ability to crawl session IDs, we recommend you avoid using them whenever possible. However, when you must use them, you should avoid giving search engine spiders access to them. The best plan is to not use session IDs until you actually need to track the state of your customer, such as when they take an action such as adding items to their shopping cart. You can also store your session IDs in cookies instead of your URLs. Most web applications can be configured to store user state in cookies. And, once again, if this sounds complicated, then have your web or tech people handle this element of your Web site architecture. What YOU need to know is that the more dynamic variables you include in your URLs, the harder it will be for search engines to index your pages. Strive to keep your URLs simple and free of dynamic elements.

Sitemaps: What, Why, and How

First lets start out with the simple fact that there are 2 different types of sitemaps. According to Bing, the difference between Sitemap and sitemap is:
Sitemap, the capitalized version, refers to the XML-based files created specifically for the search engine crawlers. This version of the Sitemap provides the crawlers the “most important pages and directories within their sites for crawling and indexing.” sitemap, the lowercase version, is an HTML-based file that is for both the Web site user and the MSNbot. It’s essentially, a simple, clean and organized list of all the pages on your Website. HTML sitemap is an on-site Web page that links to all the other pages on your Web site. It ensures that any spider crawling your site can easily and quickly find and index all of your site’s Web pages. This type of sitemap is for spiders first and foremost but it can also be useful for Web site visitors. By linking your homepage to your HTML sitemap, you ensure that each page on your site is only one click away from your sitemap and only two clicks away from your homepage. This is the optimum Web site structure in terms of making Web pages easy for the search engine spiders to find.

As you now know, search engine spiders find new pages by following links from the pages that are already in their index. Thus, if you want a spider to crawl a new Web page, it needs to find a link from a page that is already indexed in the search engine. However, unless you have a very small site, linking to every page on your site from your homepage would look messy and unprofessional to your customers. Therefore the HTML sitemap enables you to accomplish this objective cleanly and professionally. Your Sitemap provides a list of links that Google’s spider can easily follow. This will help you get new pages indexed without cluttering your home page with links. That’s assuming of course, that Google has already indexed your sitemap. So, by placing a link on your home page to the Sitemap, and then links from the sitemap to the rest of your important pages, you make all of your site’s pages easy to find and index.

We’ve mentioned here the importance of linking to your HTML sitemap from your home page, but, for good measure, you should also place a link to your HTML sitemap on every single page on your site. That way, even if a search engine can’t reach your homepage for some reason, it can still easily access your sitemap and find all your other pages. By the way, this Web site architectural element should be considered standard operating procedure as the search engines themselves actually recommend you use a Sitemap to ensure that all of your Web pages get indexed.

HTML sitemaps for Large Sites

If you have a large site, you may be wondering whether it’s better to create one large HTML sitemap or several smaller ones. There are a couple of factors to consider:
First, the degree to which search engines will index pages and follow links on a page is largely determined by the quality of links pointing to that page (or to the site the page is on). If a site doesn’t have many incoming links, then it’s a good idea to make pages smaller and put fewer links on them. A good rule of thumb is to keep your pages under 101k file size of HTML code, and to put no more than 100 links on a page. Popular sites can easily get more of their pages indexed, but to be safe, use 101k of HTML code (don’t count the file size of images) and 100 links as the upper limit. Therefore, if your entire site is fewer than 100 pages, and you can create a HTML sitemap page smaller in file size than 101k, then it’s beneficial to use only one sitemap that points a search engine spider to the rest of your site.

There is an advantage to having only one HTML sitemap placed within the root domain. It enables the search engine spider to find and index all of your pages without having to traverse your site any deeper than two links beyond your home page. That’s one link from your home page to your sitemap, then one more from your sitemap to every other page on your site. This makes it easiest for the spiders to find every page on your site. However, once a HTML sitemap approaches 100 or so links or the file size of your sitemap Web page file size approaches 101k (excluding images), then it’s time to start splitting up your sitemap into smaller ones. We’d suggest linking to each sitemap from all your pages. Five sitemaps, for example, would require five links from each page instead of one. The end result would be that a spider would still only need to follow a maximum of two-links-deep beyond your homepage to reach all your pages. If, for some reason, it isn’t practical to place all five links to your five HTML sitemaps on the home page, then we’d suggest a single link on the home page that points to a master sitemap which, in turn, contains the five links to the five smaller sitemaps. This would require the search engine spider to travel three links deep into your site to locate and index all of your pages—which is still quite good. And again, link to your master HTML sitemap from all your pages, not just your home page.

Finally, we recommend that you avoid forcing a spider to crawl any deeper than three links beyond your home page to locate the rest of your pages. Using the site structure outlined above should allow you to easily accomplish that objective.

XML Sitemaps; How to Get Your Difficult-To-Index Pages Fully Listed

images (1)

You should carefully note the difference between an onsite sitemap (HTML sitemap) and an XML Sitemap. Your Web site should utilize both—as both are an important part of helping your site get, and stay, indexed by search engines. The regular HTML Web page sitemap (as explained in the previous two chapters) is simply an on-site Web page that links to all the other pages on your Web site. It ensures that any spider crawling your site can easily and quickly find and index all of your site’s Web pages. On the other hand, the XML Sitemap (aka a Google Sitemap, although it’s used by Yahoo and Microsoft as well) is a special file that provides search engines with specific directives about what pages to crawl and how often. Search engines are not required to strictly obey these directives but they do tend to use them as guidelines. This type of Sitemap is especially useful for very large sites that want to get all their pages listed. A great example of a large site that NEEDS to have a good XML Sitemap is an eCommerce site that wants to get its entire list of product pages indexed and listed in the
search results.

Please note that neither the HTML sitemap nor the XML Sitemap play any role in where your pages will rank. Both are simply a vehicle for getting your Web pages indexed most efficiently. Where your pages rank depends on your incoming links and other optimization factors. Bing also subscribes to the XML Sitemap protocol. You can submit your XML Sitemap in the same format that you use to submit it to Google by using their Webmaster Tools Service. For more on the Sitemaps protocol and how it can help your pages get (and stay) indexed by the top three search engines, be sure to visit Sitemaps.org:  http://www.sitemaps.org

Regardless, a Google XML Sitemap is really no replacement for clean and crawlable URLs, so a tool like mod_rewrite still comes in handy if you are attempting to simplify your dynamic URLs. If this sounds complicated, give it to your Web or Tech people who will probably tell you, this is actually pretty simple once you do it. An important note: If your site is already ranked in the search engines, be very careful about changing your URLs. Carelessly modifying your URLs after your pages have already been indexed and ranked is one of the worst SEO mistakes you can make! And, if you fail to effectively tell the search engine where to find the new location of the page, then the search engine will assume the page has disappeared and will drop it from their index. Not good. However, if you do change any URL, you must redirect the old URL to the new location. Visitors and search engines that are looking for the old URL will then be automatically redirected (sent) to the new URL, saving you lost search rankings while accommodating your site visitors.

Of course, in a perfect world, you’ll never need to move a Web page or Web site. However, if you must, then this tutorial is critical to your success. Without it you risk causing grave damage to your Web site’s rankings, especially if your pages are already doing well in the search results. You could easily lose all of your rankings if you get this critical procedure wrong. You have been warned! Note that if you’re using mod_rewrite to rewrite your URLs, the 301 redirect can be added to your mod_rewrite code

How to Use Robots.txt for More Targeted Web page Indexing

robots.txt

Your Robots.txt file is a tricky sounding name for a simple text file that is placed in the root directory of your Web site. Its purpose is to provide crawling directions to search engine spiders that are specific to your site. In other words, your robots.txt file tells search engine spiders which pages NOT to index. A common misconception is that a robots.txt file can somehow be used to encourage search engines to crawl a site. Not true! Most pages are eagerly spidered by search engines without requiring additional encouragement. As you are probably now noticing, an important part of SEO is identifying the elements that cause indexing difficulties for the spiders while eliminating these problematic elements. So, why would you ever want to tell a search engine NOT to index some of your pages? Well, because search engine spiders function with limited time and resources when indexing sites. Therefore your site will be better served by focusing on getting your important content, product listings, and sales pages indexed.

Case-in-point: Chances are good that you do NOT want a search engine to index your shopping cart. There is typically no benefit to you when your shopping cart checkout pages show up in the search engine results. Therefore, you would use your robots.txt file to make sure search engines don’t waste time indexing your shopping cart. That way they are more likely to spend their time on your site indexing your more important sales or informational content pages. Other pages you’ll want to keep search engine spiders away from include anything in your cgi- bin folder, as well as directories that contain images or otherwise sensitive company data. Whenever there isn’t any benefit to having a Web page (or image) displayed in the search results, then you should forbid the spiders from indexing it by placing the appropriate command within your robots.txt file.

That will not only help focus the search engine’s resources on your important pages, but will also provide the useful side benefit of protecting your site from hackers who may otherwise use search engine results to acquire sensitive information about your company or site. Search engine spiders are typically voracious about indexing anything they can find on the web, including sensitive areas like password files, so you must be careful. The robots.txt file can help you layer in some of the protection you need.

By the way, there’s one more issue to be aware of that relates to the robots.txt file. A surprising number of sites have inadvertently and unintentionally set up their robots.txt files to prevent search engine spiders from crawling any portion of their Web site (oops!). For example, the following two lines when added to your robots.txt file is enough to keep all major search engines from ever crawling your site. In other words, the following command tells ALL search engine spiders to simply go away:

User-agent: *
Disallow: /

This has been an area of confusion for some people. They use the wrong command and then they wonder why they can’t find their site listed in the search engines. So, be very careful this doesn’t happen to you! If you decide that you want to block a specific search engine spider, you should put the name of the spider to block on the User-agent line—NOT the asterisk. The asterisk (*) symbol is a wildcard meaning all. The Disallow line is where you put the directory that should not be indexed. Then forward slash (/) indicates the root directory, in other words your entire site. As you can see, the robots.txt directive above is a total shut-out of all search engines from your entire site. On the other hand, entries like this:

User-agent: *
Disallow: /cgi-bin/

…should (we say “should” because it’s technically optional for search engines to obey the robots.txt directives) prevent all URL’s in the /cgi-bin/ directory from being crawled. Keep in mind that these directives are case sensitive. If you want the spiders to crawl every Web page it can find on your site, there is no need for a robots.txt file. The only time you actually need to use robots.txt is if you want to restrict the crawler from some portion of your site. Google’s Webmaster Tools provides a report which will tell you exactly what URLs Google has attempted to crawl on your site but were restricted from crawling by your robots.txt file.

You can access Google Webmaster Tools at: http://www.google.com/webmasters/sitemaps

To see this report, go to Google’s Webmaster Tools (sign up and register your site if you haven’t already), click the Diagnostic tab, then click the Web crawl link. Finally, click the report that says URLs restricted by robots.txt to see what pages Google is not indexing due to commands in your robots.txt file. Google’s Webmaster Tools also offers a special robots.txt debugger which allows you to test specific URLs to see if your robots.txt file allows or blocks spider access. If you’re having problems getting pages indexed, be sure to test those pages against your robots.txt file using Google’s tool and see if you have a statement blocking Google’s spider. If you do, Google will show you what line in your robots.txt the blocking statement is on.

Be Careful with using Frames, JavaScript, and Flash

flash

There are a lot of myths and misunderstandings about the way search engines handle Frames, JavaScript, and Flash pages. The fact is that Web pages using these formats can only theoretically be optimized for search engines, but each presents its own unique challenges and difficulties. As a general rule, they’re best avoided, since pages that don’t use them are much easier to optimize for search engines. However, if you find that you must use them, or if you’re optimizing a site that’s already builtaround these technologies, here’s what you need to know to minimize their disadvantages.

How Pros Use Frames and Still Rank at the Top

Ever since we can remember, the use of the frame (also used with frameset) tag has been a thorn in the side of SEOs. They’re tricky to work with and if not done correctly they can kill your site’s chance of being crawled correctly and therefore ranking well in the engines. Because of this we’ve taken the stance that you shouldn’t use the frame tag unless you absolutely have to. The alternative we’ve suggested is the iframe tag, which is similar in nature and has been applied without issue. However, these two tags are different from each other and we’ve decided to take a little time to re-evaluate them and their use within your SEO campaign. First off, frames are an HTML element that pulls content in from another URL to the URL of your choice. It’s like copying everything on a page to mirror it on another page. Sometimes there are solid reasons to use frames on your Web site. Perhaps you have a legacy site that will take too much time and energy to change over. Perhaps you’re doing it for your affiliate campaigns. Regardless of why, if you simply must use the frame or iframe tags on your Web site, here are some guidelines to assist you in overcoming the potential disadvantages.

How Spiders Index Frames

These days Google and Bing are very good about correctly indexing content within <frameset></frameset> and <iframe></iframe> tags. However, the very important thing for you to understand is how the engines interpret the content. They crawl and index the non framed content on the site as one page and then crawl and index the framed content as an entirely separate page. That means they will not associate the framed content with the main page it’s being presented on. Think of the framed content like a large image. The img alt tag tells search engines, and your visitors when the image doesn’t load, what the image is. There is a special tag called the noframe tag, which is designed to instruct users and search engines what the framed content is when frames are disabled. Basic use is something like this…

<noframes>Put your keyword-rich frame describing content here.</noframes>

When using the <noframe> tag, ideally you want it as high up on the page as possible and to contain the text and/or links about the framed page (written using keywords pertinent to your site). This content will then be readable by the search engine spiders as well as by people whose browsers do not support frames.

Note: It is very important that the <noframe> tag is outside of the <frameset> or <iframe> tags so the search engines can find them.

We also feel it’s important to remind you that we do not use frames at all. We feel that frames do not add anything to search engine findability — nor do they add to product sellability. We have even seen problems caused by the framed content getting indexed directly. When this happens users land on a page that potentially has a lack of navigation or other necessary elements that make the page completely dysfunctional. Understanding the frame tag compared to the iframe tag We put together a sample template of a standard page using the frame tag to help display what areas will be indexed, and what won’t.

<html>
<head>
<title>Title text here will be read by spiders, regardless of frames</title>
<meta name=”description” content=”Text here will be read by spiders that read meta tags, regardless of frames. It is important to note that you should include a META tag description summary of all the frames on this particular page.“>

Also note that many search engines index the ALT attributes in the <IMG> tag. If your site mainly consists of graphics (bad idea!), you can also use the HTML ALT attribute to describe your page.

<meta name=”keywords” content=”Text inside the keywords tag is typically ignored by most search engines regardless of frames“>
</head>
<body>
<noframes>

Text here will be read by most search engine spiders just as if you were not using frames — this is the place to load descriptive text including your keywords. Make sure that the text is in a readable format, as it will likely get used in the search results for the snippet or description text.

</noframes>
<frameset cols=”25%,50%,25%”>
<frame src=”frame1.html” />
<frame src=”frame2.html” />
<frame src=”frame3.html” />

Text here will also typically get indexed by most engines but results may vary. The content within the
three files referenced above.

frame1.html, frame2.html and frame3.html will not be indexed along with this page.

</frameset>
<noframes>Text here will also typically be read and indexed by spiders.</noframes>
</body>
</html>

Now, we’ll take a moment to explain and show an example of the iframe tag

The use of the <iframe> tag is increasing to embed dynamic information and all sorts of widgets onto a site. Facebook’s “Like” button widget is an excellent example that uses the <iframe> tag to do its magic. What many don’t realize is that lots of these embedded iframe widgets typically don’t generate a link back to their site – which is one of the main reasons they generated the widget in the first place! However, if you set the code up as below with indexable content within the iframe tag, it will get indexed (including any links in that text).

<iframe src=”http://www.facebook.com/plugins/like.php” scrolling=”no” frameborder=”0″ style=”border:none; overflow:hidden; width:150px; height:50px;” allowTransparency=”true”>Content, and links will get indexed here by most engines as it is visible text on the page. Anything that is pulled in using the iframe tag, will not get indexed with the page. So if you want your iframe powered widgets to generate a link back to your site, make sure and include that code in this area.</iframe>

In summary, the frame tag and frame based Web sites are something to avoid whenever possible, even though there are ways to get some content indexed. The iframe tag on the other hand, when used correctly, can be a good method of link building. Just be aware that the content pulled in by the iframe tag is not going to get indexed as if it were static html code on the page. All in all, the take-away from this topic is to avoid using frames on Web pages that you want indexed in the search engines.

1. Understanding JavaScript

While it is true that Google can now find and follow JavaScript links, it is also true that Yahoo and Bing cannot. Therefore, you should think twice about creating JavaScript links. If you feel they are absolutely essential to your Web site’s overall design scheme, then pay close attention here to ensure you are setting them up properly. JavaScript links typically use a function called an onclick() event to open links when clicked. The onclick() event then calls JavaScript code which tells the browser what page to open. That code can either be on the same page, or it can be embedded in separate file. Currently, if the code called by the onclick() event is on the same page, Google will process the code, crawl the URL listed and pass anchor text and PageRank to that URL. However, if the code is in a separate file, then Google does not process it.

Here are some examples of code that Google can understand, with links that will pass both anchor text and PageRank:

<div onclick=”document.location.href=’http://foo.com/“>
<tr onclick=”myfunction(‘index.html’)”><a href=”#” onclick=”myfunction()”>new page</a>
<a href=”javascript:void(0)” onclick=”window.open (‘welcome.html’)”>open new window</a>

Remember that, even though Google’s can process these links, JavaScript is not the ideal format for your links. Neither Yahoo nor Bing can read and process these JavaScript links. In addition, JavaScript links generally fail to display properly on mobile devices and screen readers.

You should also be aware that using JavaScript to cloak (aka, hide) links for PageRank sculpting or to prevent paid links from being indexed, now requires that you move your code to an external file if you want to prevent Google from finding those links. Or, you can simply add the nofollow tag to links that you’d like hidden from Google in order to facilitate PageRank sculpting.

Another option is to put the script contents in a remote.js file. Here’s how: In your .html page, reference the remote.js file like this:

<script language=JavaScript src=”javascript/remotefile.js”type=”text/javascript”></script>

Then place your JavaScript code in the remote file (i.e., remotefile.js). The bottom line is that there are plenty of other ways to spruce up your pages. Besides, studies have shown that complex JavaScript and Frames in general tend to actually reduce sales conversions.

2. Macromedia Flash

  1. These are those animated, eye-pleasing, motion picture style, high end Web pages and they cannot be easily optimized for search engines. So you put your site at a ranking disadvantage if your site’s architecture relies heavily on Flash. Even though there are work-arounds and exceptions, in general, search engines have difficulty indexing Flash in any meaningful way. As you know, your keywords provide one of the most valuable elements for search engines to determine what your Web pages are about. However it’s difficult-to-impossible for search engines to reliably extract keywords from Flash files. This means that any part of your Web page that uses Flash will generally NOT lend itself to top rankings. However, you can still use some Flash on your pages as long as you observe these guidelines:
  2. Don’t make your entire page one big Flash file. Make sure your page has abundant indexable content outside your Flash file.
  3. If you’re just using Flash to animate part of your page, and the rest of your page is in normal HTML and contains your keywords, then search engines will know what your page is about by reading that HTML (even though they’ll likely ignore the Flash). However, if most of your page is embedded in a Flash file, then it will be very difficult for a search engine to know what your page is about. That puts your Web page at a serious ranking disadvantage.
  4. Use the <noembed> tag. This is a good approach to take if you simply must create all Flash pages. Flash programmers know that any link to a Flash file must be enclosed in an <embed> tag. HTML also contains a <noembed> tag. This is where you should put the HTML version of whatever Flash you’re using on that page. Not only does this give the search engine something to read, but it also provides an alternative for those users who don’t have Flash installed in their browsers. Although Google is getting a little bit better at indexing Flash pages, they still don’t do it well. So don’t count on Flash pages to put your site on equal footing with non-Flash pages. You’ll be at a disadvantage, even with Google. Sure, there are sites like Oprah.com that use heavy amounts of Flash and do quite well in the search rankings. But that’s generally due to brand recognition and the accumulation of tons of links that propel them to the top of the rankings in spite of how unfriendly their site architecture might actually be to search engines. If you’ve got an Oprah-sized brand, and you really want that animated homepage, then by all means take the Flash route.

The bottom line is that Flash pages will always be disadvantaged in the rankings. But, if you must use them, then use the methodology outlined above to make sure the keywords you want indexed can be found outside your Flash files.

Lesson 6 Review

In this lesson you learned:

  1. The importance of designing search-friendly pages using the correct site architecture that, ideally, places every page on your site no more than two links deep, and three links max.
  2. How and why to keep your URLs simple.
  3. The importance of managing your Session IDs and Dynamic URLs in ways that do not confuse the search engine spiders.
  4. You’ve been taught the ONLY correct way to configure your Meta tags and learned about the dangerous Meta-tag-myths that inexplicably continue to survive.
  5. How to control the way your Web page displays in the listings with Rich Snippets.
  6. All about Sitemaps; both onsite HTML sitemaps and XML Sitemaps that help get your Web pages more efficiently indexed.
  7. How a simple robots.txt file can be used to keep spiders from indexing your unimportant, or potentially confusing Web pages. This enables the spiders to focus on indexing only your important pages.
  8. The pro and cons of using Frames, JavaScript, and Flash as any major component of your Web site’s architecture.

By now you are rockin! You’ve come a long way baby and just for fun, the next lesson, THE FINAL Lesson, gives you a peek at the pinnacle of SEO from the perspective of the SEO Expert.

Leave a Reply

Skip to toolbar