How to improve search engine findability for a Drupal web site

Topics: Drupal
Technologies: Drupal 5+

A "findable" web site is one that search engines can easily scan to find content to add to their search indexes. Since most visitors today find web pages by using a search engine, the more easily search engines can index your content, the more easily visitors will find it. A default Drupal installation is moderately findable, but you can improve the site by installing a few more modules.

Search Engine Optimization (SEO)

Search engine findability is part of the larger topic of Search Engine Optimization (SEO) which strives to tune web content to be more search engine friendly. Unfortunately, a lot of SEO advice out there is arcane nonsense at best, and irresponsible attempts to trick people and search engines at worst. Let's avoid that stuff.

Improving search engine findability is straight-forward and mostly just common sense. Even better, making your site work better for search engines often makes it work better for visitors too. And, after all, visitors are who you should really care about, not search engines.

List pages in an XML sitemap

Most search engines can read an XML sitemap that lists the pages you’d like a search engine to find. For each page, the sitemap notes when the page was last modified, how frequently it's been modified, and how important you think it is for a search engine to index the page. Here’s a sample sitemap entry:

<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>

While you can create this file yourself using any text editor, it's much easier to use the Drupal XML Sitemap module.

Install the XML Sitemap module

  • Install: Download the module then enable it on the Administer > Site building > Modules page.
  • Configure: Change module settings on the Administer > Site configuration > XML Sitemap page:

    Priority settings: every page in the sitemap will have an indexing priority between 0.0 and 1.0. You can set site-wide priorities here, and individual page priorities at the bottom of each node's edit page in the new XML Sitemap settings section.

    Search engine settings: an XML sitemap is only useful if you tell search engines about it. In this section, tell the module to notify search engines automatically each time the sitemap is updated when site content changes.

    Other settings: an XML sitemap can list every page at your site, but some pages may not be search engine-worthy. In this section, check off whether you want to include taxonomy term and user profile pages.

  • Test: View the XML sitemap at http://yoursite/sitemap.xml. Browsers show the file in a tabular form so that it's easy to see the list of pages and their indexing priorities.
  • Configure your site: The robots.txt file in your site's top-level folder tells search engines how to handle your site. Edit the file and add this line anywhere to point to your new sitemap:
    Sitemap: http://yoursite/sitemap.xml
  • Tell search engines: This module will ping search engines automatically when the sitemap changes. But to be sure they get the file, create a free webmaster account at the search engines that use sitemaps and register the sitemap: Google Webmaster Tools, Yahoo! Site Explorer.

List pages in an HTML sitemap

The XML sitemap discussed above is a definitive list of pages at your site. It’s great for search engines that support it (Google, Yahoo!, Ask, and Windows Live right now), but there are other search engines. These other engines rely on crawling your site and following its links to find your pages. To give these engines an easy crawl path, create an HTML sitemap web page that lists everything. Site visitors also can use the sitemap to quickly jump to pages.

You can create a sitemap page in Drupal using the Views module to list all node pages. But for a more complete way that also lists pages for menu items, taxonomy terms, RSS feeds, and more, use the Site_map module.

Install the Site_map module

  • Install: Download the module and enable it on the Administer > Site building > Modules page.
  • Configure: Change the module settings on the Administer > Site configuration > Site map page. Select which types of content to include in the sitemap.
  • Test: View the HTML sitemap at http://yoursite/sitemap.
  • Tell search engines and visitors: Add a link to the sitemap in your site's page footer. In this way, no matter which page a search engine or visitor arrives at, the sitemap is immediately available. You can edit the page footer on the Administer > Site configuration > Site information page.

Encourage incoming links from social networks

Incoming links from other sites give visitors and search engines a path to your site. Once there, they can follow your internal links to interesting content. There are many Spamdexing and paid-link schemes of dubious value and ethics that try to add lots of bogus links to your site. But really, the only valid incoming link is one that occurs naturally because somebody actually likes your site's content.

You can make it easier for people to add incoming links from social network sites like Digg, Technorati, del.icio.us, and others by installing the Service links module. The module adds links to these sites at the bottom of your pages. When a visitor clicks on one, they add a vote for your page at the social networking site (where they have to login). The more votes your page gets, the higher it ranks there and the more likely visitors and search engines will see it and follow a link to your site. Essentially this is a social networking-style update to Google's famous PageRank algorithm that ranks pages by popularity.

One caveat: some social networking sites mark outgoing links with "nofollow", which tells search engines to not follow the link to your site. In such cases, even if your page gets lots of votes, search engines won't find it any better. But visitors will, so this is still a good feature to add to your site.

Install the Service Links module

  • Install: Download the module and enable it on the Administer > Site building > Modules page.
  • Configure: Change the module settings on the Administer > Site configuration > Service links page. Select which social networking sites to add links to, and for which types of content at your site. You can choose to display these links as text or icons.
  • Test: View one of your pages and scroll to the bottom to find the links. Click one to make sure it sends you to the social networking site.

Use your site name in page titles

Search engines and site visitors guess at page topics by quickly looking at the words you use. The page title has the first words they see, so use good ones.

There are two parts to a Drupal page title: the node title and the site name, often separated by a vertical bar (e.g. "Today's Blog | Example.com"). The site name defaults to your domain name, but domain names today are often acronyms (such as "W3C.org"), concatenated words (such as "ToysRUs.com"), or abbreviations (such as "WebDevIndex.com"). Search engines can't reliably figure out what acronyms mean, split apart concatenated words, or expand abbreviations. Site visitors may have a hard time too.

A cryptic domain name in the page title is a missed opportunity to tell visitors clearly who you are, and tell search engines what the acronyms, abbreviations, and concatenated words mean. Expanding "W3C.org" to "World Wide Web Consortium" puts important keywords in the page title and it is much more readable for visitors.

To do this in Drupal, change your site configuration and set your site name to the spelled-out version of your domain name.

Create a readable site name in the page title

  • Install: The site name configuration is part of every Drupal installation.
  • Configure: Change the site name at the top of the Administer > Site configuration > Site information page.
  • Test: Save the change and look at your browser's window titlebar. It should show your new site name.

 

There is a great deal of debate about how important word choice is for search engine ranking. Numerous services offer to tell you the "best" words to use... for a price. But really, the right words to use are the ones that best describe your site and your content for visitors, not search engines. Being tricky might get you more search engine traffic, but if you've used sneaky inaccurate words, visitors will be annoyed and leave. You'll have accomplished nothing.

Use different page title formats for different parts of your site

If you want more control over Drupal page titles, you can install the Page Title module. This module enables you to use different page title formats for different parts of your site, including the front page, book pages, blog pages, etc. For instance, it is common to change the front page title from "Home | Example.com" to something friendlier like "Example.com - The Leading Site Used In Examples!". Blog posting pages might add the author's name to the title and book pages might add the book title. You can even change the title's format on a per-node basis.

Create custom page title formats

  • Install: Download the module and enable it on the Administer > Site building > Modules page. For Drupal 5, you'll need to add a bit of code to your theme's template.php file, but for Drupal 6 there's nothing to add. See the Page Title Installation Instructions for more on this.
  • Configure: Change the module settings on the Administer > Site configuration > Page title page. Set a default page title format like "[page-title] | [site-name]", then override this for specific content types. You can also change this per node on the node's edit page.
  • Test: Save your changes and look at your browser's window titlebar. It should show your new title format.

Use meaningful words in URLs

Above I recommended spelling out your domain name so that it is readable as separate words by search engines and visitors. Do the same with your URLs, for the same reasons. A URL like "http://yoursite/techtoys" is more readable as "http://yoursite/tech-toys" or "http://yoursite/technology-toys".

You can control URLs by using the built-in URL aliasing feature of Drupal. This lets you create any URL you like for each of your nodes. However, this is tedious. A better way is to install the PathAuto module to automatically generate URLs based upon a node's content type, page title, and other page attributes.

Create custom page URLs

  • Install: Download the module and enable it on the Administer > Site building > Modules page.
  • Configure: Change the module settings on the Administer > Site configuration > Pathauto page. Set a default URL format like "[title]" to use the title as the URL, or "[type]/[title]" to add the content type first. There are more variations possible. Normally the module creates URL aliases only when nodes are created. If you already have content and need aliases created for it, you can tell the module to auto-generate new aliases by checking off items in the General settings section.
  • Test: Save your changes and create a new node. Submit the node, and look at the URL. Instead of "node/1234" it'll use the format you set above.

 

When including a page title in a URL, you can select whether spaces should be separated by underscores or dashes. There is some debate about whether this matters. The module's default choice is underscores. At one point in Google history, Google used dashes to split URLs into words, but not underscores. Today's search engines are smarter, but there are still differences in search engine results when entering "search-engine", "search_engine", and "search engine".

Probable myths

The exact algorithms used by search engines are closely-guarded secrets. This creates fertile ground for imagination and leads to a tremendous amount of arcane nonsense "advice" about how to improve your site's ranking. While only the search engine developers know for sure what is or is not good advice, be skeptical. Here are a few bits of commonly-found advice about Drupal sites that are probably myths:

Myth: Use "clean URLs" because search engines don't search pages with queries

A URL query includes a "?" followed by information used to look up a page in a server-side database. The myth says that search engines won't crawl URLs with queries, so they'll never find pages retrieved from a database. Since the default Drupal installation uses URL queries to get nodes, search engines supposedly won't crawl a Drupal site. The Drupal fix is to enable "Clean URLs" on the Administer > Site configuration > Clean URLs page.

If this myth were true, no site (Drupal or otherwise) that used queries in URLs would be indexed. But look at this URL for an iPod at Walmart: "http://www.walmart.com/catalog/product.do?product_id=9193101". Every product at the site starts with the same URL, differing only in the product ID after the "?". Does it seem likely that Walmart would configure their site to be unsearchable? In fact, most web sites today use database queries to find and show their content. Does it seem likely that Google would exclude all of these from its index? Of course not. To prove it, search at Google with "iPod site:walmart.com" and you'll get a list of iPod pages... all with queries in the URLs.

So, enabling Drupal's "Clean URLs" feature is not necessary for search engine findability. It's still a good idea, though. It makes URLs easier to read for site visitors. Better yet, install the PathAuto module to automatically create even more readable URLs based upon the content type and node title.

Myth: Avoid the appearance of duplicate content or search engines will ban you

One popular, and unethical, web marketing scheme creates lots of different domain names that all point to the same content. Perhaps the page headers change, but the content is still the same. The idea is that these sites will occupy so many slots at the top of search engine listings that competing sites will be pushed away.

Duplicate content can happen legitimately too. It is common for the same site to respond to multiple domain names with and without a leading "www", with abbreviations, and with common domain name misspellings. For example, all of the following lead to Barnes & Noble:

barnesandnoble.com www.barnesandnoble.com
barnesandnobel.com www.barnesandnobel.com
barnesnoble.com www.barnesnoble.com
barnesnobel.com www.barnesnobel.com
barnsnoble.com www.barnsnobel.com
bn.com www.bn.com

 

This myth says that duplicate content is a sign of unethical web marketing and it'll get your site banned from search engines. If this myth were true, Barnes & Noble would be banned. And none of the duplicate-content marketing scheme web sites would ever show up in search engine listings. But Barnes & Noble isn't banned and those duplicate-content sites show up all the time. So, clearly this is a myth.

If you're still worried, install the Global Redirect module to insure that each Drupal page has one and only one valid URL. The module issues HTTP redirects to that one URL for all of the other URLs Drupal supports for the same page (such as "/node/1234", "/node/1234/", "q=node/1234", "blog/1234", etc.).

Myth: Add keywords in <meta> elements so that search engines know what your page is about

The HTML <meta> element has keyword and description attributes that can be filled in with keywords and text visible to search engines, but not site visitors. The idea was to let page authors clearly state the topic of their page so that search engines will categorize it well.

The myth says you should add this information to every page and that search engines value it highly. But <meta> elements were easily and heavily abused by unethical marketers stuffing them with keywords. So today, search engines do not use <meta> elements for search engine ranking.

Adding <meta> elements to your pages may still be useful. Google and Yahoo! both may use them to help generate the brief abstracts shown next to URLs in search results. You can set site-wide <meta> elements by editing your theme's templates, but a better way is to install the Meta tags module (formerly known as Nodewords). This lets you edit <meta> elements for the site and for every node. This won't help search engine ranking, but it could make search results pages easier to read.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

Nadeau software consulting
Nadeau software consulting