Addressing Duplicate Content & Canonical URLs
The first step to SEO isn't always optimizing your content for keywords.
In fact, one of the first things we do when taking on a new SEO project is to audit the technology and structure of a site. We start off by asking important questions about content of the site such as:
- How fast does the site take to load?
- Is the server overloaded, does it suffer from code-bloat or rely on poorly scripted external asset calls?
- How is the content organized?
- Does the site lend itself to providing a user experience which encourages either a path to a goal, or easy exploration and content discovery?
- Are there technical problems which may add confusion for search engines?
- Are there malformed links which lead to the same content being available through multiple URLs?
Today let’s tackle a technical problem that can confuse search engine spiders: duplicate content and canonical URL issues.
We take pride in the websites we produce and work on, but we don't like to use client sites or even competitor sites for examples of SEO pride or problems. However, we suffer from the same type of problems as the Cobbler’s kids since we’re a development shop, so I took a look under the hood of our own site to uncover some common technical issues.
The same content, but found at different URLs
When we launched our new site last year, I vaguely remember a change in terminology used in one of our web tools because it would "have a better marketing impact." This was rather late in the development process, but the change centered around one of the tools in our CMS that has many options for configuration.
During our development, we changed a section of our site to be named “Brands” instead of “Logos”. However, the terminology change didn't quite make its way through all sections of the site. In two places in the content it still referenced “Logos” in the internal links.
This morning, when I was looking through an updated XML sitemap - a tool we often use to point search engines to new content on our site - I realized we had a duplicate content issue that affected about 16 pages of content, giving the appearance of having twice that much in the site's footprint.
Duplicate content can have a negative effect on your site’s search profile. If the content comes from a different source, those pages will likely never show up in a search. If the duplicate content is from your own site, search engines will make their best guess as to which content has authority, and which doesn’t. We want to make sure we eliminate that issue from our site.
Where to start the fixing duplicate content?
First things first, I had to decide which term offers the most benefit. “Brands” and “Branding” are the terms we use to accurately describe what we create, but most people who are looking for a design firm to create a logo for their company are searching for “logos”. If you compare the query volume of the brands vs. logos using Google's Keyword tool or other search tool, you'll see that searches for “logos” has about 10x greater search volume than “brands”. Easy call, if you want your audience to find you, use their terminology.
How to fix duplicate content issues
The next step was to decide how to fix this. I could have simply made sure all links in the site linked to new structure properly, dropped a link rel="canonical" meta tag in place and called it a day. The link rel="canonical" meta tag was introduced a few years ago to combat this same problem on poorly coded ecommerce sites, it basically says to the search engine spiders: "I don't care how the URL was formed that you used to find this page, this is the proper URL you should use."
Example: http://www.example.com/products/widgets/blue/large/rainproof and http://www.example.com/products/widgets/rainproof/large/blue might bring you to the exact same page of content. But with the link rel="canonical" tag, you can tell the search engines: http://www.example.com/products/widgets/large/blue/rainproof is the official way you want that page to be linked but recognize that the same content can be found in many different ways. In short, link rel="canonical" can be a godsend on large ecommerce sites, but for smaller sites, link rel="canonical" can also be a crutch. So in this case, I wanted to fix it properly by removing the malformed URLs, permanently redirecting them (a 301 redirect), and letting the search engines know that there has been a change.
Here are the steps that I took:
- The URL needed to reference logos instead of brands. Old Structure: http://www.corporate3design.com/work/3/55/brands/authos/ New Structure: http://www.corporate3design.com/work/3/55/logos/authos/
- I identified all differing links on the site and corrected them to the “logos” form of the URL.
- I created 301 redirects from the "brands" version of the URLs to the "logos" version.
- I vented my frustrations to a co-worker or two.
- I tested every single link to ensure the redirects were working properly and with a 301 response code.
- I generated a new XML Sitemap and pinged our webmaster tools accounts with Google and Bing.
The end result of this should be clear in a few days. But hypothetically, our site's footprint will shrink by about 16 pages and the relative search strength of the two different versions of pages should be passed to the URL structure we are defining. By itself, this likely won't have much of an impact on improving our SEO for being found for “logos”, but it certainly solidifies the foundation of our site.
Unsure about your company website's performance? Renew your focus on search engine optimization tactics to become a known competitor in your industry or local market.