A "findable" web site is one that search engines can easily scan to find content to add to their search indexes. Since most visitors today find web pages by using a search engine, the more easily search engines can index your content, the more easily visitors will find it. A default Drupal installation is moderately findable, but you can improve the site by installing a few more modules.
Table of Contents
- Search Engine Optimization (SEO)
- List pages in an XML sitemap
- List pages in an HTML sitemap
- Encourage incoming links from social networks
- Use your site name in page titles
- Use different page title formats for different parts of your site
- Use meaningful words in URLs
- Probable myths
Search Engine Optimization (SEO)
Search engine findability is part of the larger topic of Search Engine Optimization (SEO) which strives to tune web content to be more search engine friendly. Unfortunately, a lot of SEO advice out there is arcane nonsense at best, and irresponsible attempts to trick people and search engines at worst. Let's avoid that stuff.
Improving search engine findability is straight-forward and mostly just common sense. Even better, making your site work better for search engines often makes it work better for visitors too. And, after all, visitors are who you should really care about, not search engines.
List pages in an XML sitemap
Most search engines can read an XML sitemap that lists the pages you’d like a search engine to find. For each page, the sitemap notes when the page was last modified, how frequently it's been modified, and how important you think it is for a search engine to index the page. Here’s a sample sitemap entry:
While you can create this file yourself using any text editor, it's much easier to use the Drupal XML Sitemap module.
List pages in an HTML sitemap
The XML sitemap discussed above is a definitive list of pages at your site. It’s great for search engines that support it (Google, Yahoo!, Ask, and Windows Live right now), but there are other search engines. These other engines rely on crawling your site and following its links to find your pages. To give these engines an easy crawl path, create an HTML sitemap web page that lists everything. Site visitors also can use the sitemap to quickly jump to pages.
You can create a sitemap page in Drupal using the Views module to list all node pages. But for a more complete way that also lists pages for menu items, taxonomy terms, RSS feeds, and more, use the Site_map module.
Encourage incoming links from social networks
Incoming links from other sites give visitors and search engines a path to your site. Once there, they can follow your internal links to interesting content. There are many Spamdexing and paid-link schemes of dubious value and ethics that try to add lots of bogus links to your site. But really, the only valid incoming link is one that occurs naturally because somebody actually likes your site's content.
You can make it easier for people to add incoming links from social network sites like Digg, Technorati, del.icio.us, and others by installing the Service links module. The module adds links to these sites at the bottom of your pages. When a visitor clicks on one, they add a vote for your page at the social networking site (where they have to login). The more votes your page gets, the higher it ranks there and the more likely visitors and search engines will see it and follow a link to your site. Essentially this is a social networking-style update to Google's famous PageRank algorithm that ranks pages by popularity.
One caveat: some social networking sites mark outgoing links with "nofollow", which tells search engines to not follow the link to your site. In such cases, even if your page gets lots of votes, search engines won't find it any better. But visitors will, so this is still a good feature to add to your site.
Use your site name in page titles
Search engines and site visitors guess at page topics by quickly looking at the words you use. The page title has the first words they see, so use good ones.
There are two parts to a Drupal page title: the node title and the site name, often separated by a vertical bar (e.g. "Today's Blog | Example.com"). The site name defaults to your domain name, but domain names today are often acronyms (such as "W3C.org"), concatenated words (such as "ToysRUs.com"), or abbreviations (such as "WebDevIndex.com"). Search engines can't reliably figure out what acronyms mean, split apart concatenated words, or expand abbreviations. Site visitors may have a hard time too.
A cryptic domain name in the page title is a missed opportunity to tell visitors clearly who you are, and tell search engines what the acronyms, abbreviations, and concatenated words mean. Expanding "W3C.org" to "World Wide Web Consortium" puts important keywords in the page title and it is much more readable for visitors.
To do this in Drupal, change your site configuration and set your site name to the spelled-out version of your domain name.
There is a great deal of debate about how important word choice is for search engine ranking. Numerous services offer to tell you the "best" words to use... for a price. But really, the right words to use are the ones that best describe your site and your content for visitors, not search engines. Being tricky might get you more search engine traffic, but if you've used sneaky inaccurate words, visitors will be annoyed and leave. You'll have accomplished nothing.
Use different page title formats for different parts of your site
If you want more control over Drupal page titles, you can install the Page Title module. This module enables you to use different page title formats for different parts of your site, including the front page, book pages, blog pages, etc. For instance, it is common to change the front page title from "Home | Example.com" to something friendlier like "Example.com - The Leading Site Used In Examples!". Blog posting pages might add the author's name to the title and book pages might add the book title. You can even change the title's format on a per-node basis.
Use meaningful words in URLs
Above I recommended spelling out your domain name so that it is readable as separate words by search engines and visitors. Do the same with your URLs, for the same reasons. A URL like "http://yoursite/techtoys" is more readable as "http://yoursite/tech-toys" or "http://yoursite/technology-toys".
You can control URLs by using the built-in URL aliasing feature of Drupal. This lets you create any URL you like for each of your nodes. However, this is tedious. A better way is to install the PathAuto module to automatically generate URLs based upon a node's content type, page title, and other page attributes.
When including a page title in a URL, you can select whether spaces should be separated by underscores or dashes. There is some debate about whether this matters. The module's default choice is underscores. At one point in Google history, Google used dashes to split URLs into words, but not underscores. Today's search engines are smarter, but there are still differences in search engine results when entering "search-engine", "search_engine", and "search engine".
The exact algorithms used by search engines are closely-guarded secrets. This creates fertile ground for imagination and leads to a tremendous amount of arcane nonsense "advice" about how to improve your site's ranking. While only the search engine developers know for sure what is or is not good advice, be skeptical. Here are a few bits of commonly-found advice about Drupal sites that are probably myths:
Myth: Use "clean URLs" because search engines don't search pages with queries
A URL query includes a "?" followed by information used to look up a page in a server-side database. The myth says that search engines won't crawl URLs with queries, so they'll never find pages retrieved from a database. Since the default Drupal installation uses URL queries to get nodes, search engines supposedly won't crawl a Drupal site. The Drupal fix is to enable "Clean URLs" on the Administer > Site configuration > Clean URLs page.
If this myth were true, no site (Drupal or otherwise) that used queries in URLs would be indexed. But look at this URL for an iPod at Walmart: "http://www.walmart.com/catalog/product.do?product_id=9193101". Every product at the site starts with the same URL, differing only in the product ID after the "?". Does it seem likely that Walmart would configure their site to be unsearchable? In fact, most web sites today use database queries to find and show their content. Does it seem likely that Google would exclude all of these from its index? Of course not. To prove it, search at Google with "iPod site:walmart.com" and you'll get a list of iPod pages... all with queries in the URLs.
So, enabling Drupal's "Clean URLs" feature is not necessary for search engine findability. It's still a good idea, though. It makes URLs easier to read for site visitors. Better yet, install the PathAuto module to automatically create even more readable URLs based upon the content type and node title.
Myth: Avoid the appearance of duplicate content or search engines will ban you
One popular, and unethical, web marketing scheme creates lots of different domain names that all point to the same content. Perhaps the page headers change, but the content is still the same. The idea is that these sites will occupy so many slots at the top of search engine listings that competing sites will be pushed away.
Duplicate content can happen legitimately too. It is common for the same site to respond to multiple domain names with and without a leading "www", with abbreviations, and with common domain name misspellings. For example, all of the following lead to Barnes & Noble:
barnesandnoble.com www.barnesandnoble.com barnesandnobel.com www.barnesandnobel.com barnesnoble.com www.barnesnoble.com barnesnobel.com www.barnesnobel.com barnsnoble.com www.barnsnobel.com bn.com www.bn.com
This myth says that duplicate content is a sign of unethical web marketing and it'll get your site banned from search engines. If this myth were true, Barnes & Noble would be banned. And none of the duplicate-content marketing scheme web sites would ever show up in search engine listings. But Barnes & Noble isn't banned and those duplicate-content sites show up all the time. So, clearly this is a myth.
If you're still worried, install the Global Redirect module to insure that each Drupal page has one and only one valid URL. The module issues HTTP redirects to that one URL for all of the other URLs Drupal supports for the same page (such as "/node/1234", "/node/1234/", "q=node/1234", "blog/1234", etc.).
Myth: Add keywords in <meta> elements so that search engines know what your page is about
<meta> element has keyword and description attributes that can be filled in with keywords and text visible to search engines, but not site visitors. The idea was to let page authors clearly state the topic of their page so that search engines will categorize it well.
The myth says you should add this information to every page and that search engines value it highly. But
<meta> elements were easily and heavily abused by unethical marketers stuffing them with keywords. So today, search engines do not use
<meta> elements for search engine ranking.
<meta> elements to your pages may still be useful. Google and Yahoo! both may use them to help generate the brief abstracts shown next to URLs in search results. You can set site-wide
<meta> elements by editing your theme's templates, but a better way is to install the Meta tags module (formerly known as Nodewords). This lets you edit
<meta> elements for the site and for every node. This won't help search engine ranking, but it could make search results pages easier to read.