Digital Marketing Agency | SEO, Paid Social & PPC

Google On How It Chooses Canonical Webpages

Explore Google's methodology for selecting canonical webpages, ensuring accurate indexing and optimal search rankings.

Share This Post

In a Google Search Central video, Gary Illyes from Google explores a segment of webpage indexing that revolves around selecting canonicals. He elucidated what a canonical signifies to Google, offering a brief overview of webpage signals. Additionally, he highlighted the concept of the “centerpiece” of a page and discussed the approach toward handling duplicates, suggesting a fresh perspective on them.

Google On How It Chooses Canonical Webpages

What does a Canonical Webpage mean?

A canonical webpage is the preferred version of a webpage among multiple duplicates or variations. It serves as the authoritative version that search engines like Google use to determine the primary content for indexing and ranking purposes. Canonicalization helps to prevent duplicate content issues and ensures that search engine traffic is directed to the desired version of the webpage. The canonical URL is typically specified using a rel=”canonical” link element in the webpage’s HTML code.

The interpretation of canonical can vary across different viewpoints—be it from the publisher’s and SEO’s perspectives on our end of the search box or from Google’s viewpoint. Publishers typically designate what they perceive as the “original” webpage, while SEOs often focus on selecting the most “robust” version of a webpage for ranking purposes.

However, Google’s perspective on canonicalization differs significantly from that of publishers and SEOs, making it valuable to hear insights directly from a Google expert like Gary Illyes.

Insights From Gary Illyes Regarding Google’s Crawling Priorities

Google’s official documentation on canonicalization uses the term “deduplication” to describe the process of selecting a canonical and outlines five common reasons why a site might feature duplicate pages.

Five Causes of Duplicate Pages:

  • Regional Variants: For instance, content tailored for both the USA and the UK, accessible via distinct URLs but featuring essentially identical content in the same language.
  • Device Variants: For example, a webpage has both mobile and desktop versions.
  • Protocol Variants: Such as the HTTP and HTTPS versions of a website.
  • Site Functions: Like the outcomes of sorting and filtering functions on a category page.
  • Accidental Variants: For instance, inadvertently leaving the demo version of the site accessible to crawlers.

Canonicals can be approached from three distinct perspectives, and there are at least five explanations for duplicate pages. Gary introduces another perspective on canonicals.

Signals play a crucial role in the selection of canonicals. Illyes gives an additional definition of canonicals, this time focusing on indexing, and exploring the signals used in the process of selecting canonicals.

Gary elaborates:

“Google determines if the page is a duplicate of another already known page and which version should be kept in the index, the canonical version.

But in this context, the canonical version is the page from a group of duplicate pages that best represents the group according to the signals we’ve collected about each version.”

Gary pauses to discuss duplicate clustering and subsequently resumes discussing signals shortly thereafter.

How to Avoid Duplicate Content: Best Helpful Tool for Bloggers

He proceeded:

“For the most part, only canonical pages appear in Search results. But how do we know which page is canonical?

So once Google has the content of your page, or more specifically the main content or centerpiece of a page, it will group it with one or more pages featuring similar content, if any. This is duplicate clustering.”

I’d like to take a moment to highlight that Gary refers to the primary content as the “centerpiece of a page,” which is intriguing considering Google’s Martin Splitt introduced a concept called the Centerpiece Annotation. Although Gary didn’t delve into the specifics of the Centerpiece Annotation, his insight sheds some light on it.

Illyes elucidates the meaning of “signals”:

“Then it compares a handful of signals it has already calculated for each page to select a canonical version.

Signals are pieces of information that the search engine collects about pages and websites, which are used for further processing.

Some signals are very straightforward, such as site owner annotations in HTML like rel=”canonical”, while others, like the importance of an individual page on the internet, are less straightforward.”

Duplicate clusters are consolidated under a single canonical page, as Gary elucidates. For each cluster of duplicate pages in the search results, one page is selected to serve as the canonical representation. Every duplicate cluster is associated with one canonical page.

He goes on to say:

“Each of the duplicate clusters will have a single version of the content selected as canonical.

This version will represent the content in Search results for all the other versions.

The other versions in the cluster become alternate versions that may be served in different contexts, like if the user is searching for a very specific page from the cluster.”

Variations of webpages present intriguing possibilities, particularly for ecommerce platforms, as the last part highlighted. It’s essential to consider because it can enhance the ability to rank for multiple keyword variations.

Occasionally, the content management system (CMS) generates duplicate webpages to accommodate product variations, such as differing sizes or colors. These variations may influence the description. Google might opt to rank these variants in search results when a specific variant page closely aligns with a search query.

This consideration is important because there might be a temptation to redirect noindex variant webpages to prevent them from being indexed, fearing a non-existent keyword cannibalization issue. However, implementing a noindex on variant pages can backfire. There are instances where these variant pages are better suited to rank for a more nuanced search query containing colors, sizes, or version numbers different from those on the canonical page.

Key Insights on Canonicals to Keep in Mind:

Gary’s discussion of canonicals consists of a wealth of information, including ancillary topics related to the main content.

Google On Question About Signals And Syndicated Content

Here are key takeaways:

  • The main content is identified as the Centerpiece.
  • Google evaluates a “handful of signals” for each discovered page.
  • Signals represent data used for “further processing” post webpage discovery.
  • Certain signals, like hints (including directives), are under the publisher’s control. The rel=canonical link attribute mentioned by Illyes serves as an example.
  • Other signals, such as the page’s significance within the Internet context, lie beyond the publisher’s control.
  • Some duplicate pages can function as alternate versions.
  • Alternate versions of webpages retain the potential to rank and prove beneficial for both Google and the publisher in terms of ranking objectives.

Would you like to read more about “Google On How It Chooses Canonical Webpages” related articles? If so, we invite you to take a look at our other tech topics before you leave!

Use our Internet marketing service to help you rank on the first page of SERP.

Subscribe To Our Newsletter

Get updates and learn from the best