Duplicate Content & SEO: Everything You Need to Know
What content is considered “duplicate”?
Content is duplicate when it appears on more than one URL. Duplicate content is not a penalty, but it could result in your content being filtered out of search results. That is because Google will choose one version of a piece of content as the canonical and not show the rest.
Why is duplicate content a problem?
When there are two or more versions of the same content, search engines like Google find it difficult to decipher which version of the content is most relevant to a particular query. If the same or similar content appears on URL A and URL B, which version does Google show in search results? The search engine doesn’t want to show both due to redundancy - they want each result to be uniquely relevant. One version of the content will likely win out, while the others will only serve as dead weight on the site.
If you’ve ever seen this before, it’s a good example of Google “filtering” out content it sees as redundant, supplemental, and not the preferred version of that particular content resource:
If you have content that’s located in more than one location on your website, you don’t want to risk Google choosing your non-preferred version, so you should implement a rel=canonical tag to suggest to Google which one should be included in the index.
Fixing Internally Duplicate Content
You have three options for fixing internally duplicate content:
Keep one version of the content, delete the others and 301 redirect them to the version you kept - this option is good if you don’t need the content to be located in multiple places on your site.
Keep all versions of the content but add a rel=canonical tag on all the non-preferred ones - this option is good if you do need to have a page be accessed at different locations on your site.
Rewrite content so that the versions previously sharing content are now all unique - this option could be used when you want all your pages to be indexed and have time to create unique content for each of them.
You’ll end up with 1 strong page rather than 2 or more weaker pages that Google has to decide between.
Addressing Cross-Domain Duplicate Content
When your site’s content is duplicative of another website’s content, should you delete that content, leave it, rewrite it, or something else?
Google’s stance on duplicate content: “Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”
If it appears your site’s content has been blatantly scraped/stolen and used on another website, that is a violation of Google’s quality guidelines. Sites that do this could actually get a manual penalty. If your site’s content has been taken and used without your permission, you have two options that I would recommend pursuing in this order:
Reach out to that site’s owner to ask that the content be removed, and if that isn’t effective…
File a DMCA notice (use only in extreme cases when you’re sure your content is the original, after you’ve already tried reaching out to the site owner, and when there’s a substantial amount of stolen content - the whole of a very important page or a scraping of multiple pages)
Google says it does a good job of choosing which version of the content to show in search results on its own, so before you take extreme measures, search a portion of the copied page’s content in Google and see which domain comes up, yours or the scraper’s.
A note on syndication and duplicate content: Google does not consider legitimately licensed content as “duplicate.” For example, news sites that republish articles from The Associated Press are not scraping. However, sites that syndicate really should be using a rel=canonical to indicate the original source of the article.
Citing Content & “Public Domain” Content
You can use the blockquote tag in the html of your content to cite the source you’re quoting rather than rewriting something that is best left as-is. A good use-case for this might be using the blockquote tag to include laws/statutes word-for-word on a legal client’s page.
Some other content can be considered “public domain” and not need to be rewritten or cited at all. A good example of this is a definition - no one owns it, and you can safely use it as-is within the content of your page.
Manufacturer-Required Duplicate Content
Businesses that sell, use, or recommend certain products may run into duplicate content issues due to the fact that manufacturers can require those businesses to use specific product descriptions on their websites.
This also applies to award & association content (for example, Super Lawyers requires award recipients to use specified language when announcing the honor), disclaimers, and other legally-required language.
Is manufacturer-required duplicate content bad for my website? Typically, the manufacturer specs don’t make up the lion’s share of the web page, meaning there are other things that make the page unique, such as product reviews. In Google’s quality rater guidelines, one of their examples of a high-quality e-commerce page is a page that uses the manufacturer’s product specs. Google said it was good because it used this content, whereas most SEOs might be concerned that Google wouldn’t like it because it’s shared across other retailers that carry the product.
If you’re really concerned about it and you think it might be harming site performance, you have some options:
Bolster the ratio of unique to non-unique content: In the event that you have to use a particular product or association description, you could opt to create a lot of uniquely valuable content to support it. That way, the page doesn’t solely contain the duplicate content that can be found tons of other places on the web.
This option is good for duplicate content that has to live on important pages.
Use an iframe to display the non-unique content: A manufacturer may require you to display a certain product description on your website, but they probably don’t care whether that content is stored in your site’s html or being pulled from the original source.
It’s harder to control how content in iframes look, so many site owners don’t prefer this option. It could look something like this: <iframe src="domain.com" height="200" width="300"></iframe>
NoIndex the pages: If you need the pages for the manufacturer but that content does not live on pages that you’re trying to rank, you could NoIndex the pages.
This option is good when the duplicate content lives on pages where ranking for a keyword is not desired.