Duplicate content: how to spot and avoid it

IONOS editorial team2021-07-149 mins

One of the most important principles of search engine optimization is unique content. This ensures a better ranking of a website among search results and makes for a more positive user experience. It’s the basis for a successful content strategy.

The opposite of unique content is known as duplicate content. The term describes text blocks or entire web pages that are duplicated across multiple URLs. Avoiding this type of content in favor of unique content is important for successful search engine optimization. Duplicated content negatively affects website ranking and usability.

$1 Domain Names – Register yours today!

Simple registration
Premium TLDs at great prices
24/7 personal consultant included
Free privacy protection for eligible domains

What is duplicate content?

The term duplicate content refers to web pages or text passages that are duplicated across more than one URL.

A distinction is generally made between two types:

Internal duplicate content refers to content duplicated across a single domain.
External duplicate content is found across domains.

Both terms refer to pages or text blocks that are shared without modification. Where content isn’t 100 percent identical but almost identical, it’s referred to as near duplicate content.

Common examples of duplicate content

Most website owners are aware of the negative effects of duplicate content and therefore avoid it. However, internal duplicate content is a lot more common. Often this has technical causes. You can find more information on this in the section “Technical causes for duplicate content”.

Often, the same content can be found several times across multiple sub-pages of a domain. Online shops in particular struggle with this issue. When a product is assigned to several categories or is available in different variations, the product description is often largely the same across multiple pages. This is counted as duplicate content. PDFs containing product information are commonly underestimated. If their content matches that of a product landing page, they are counted as duplicate content. Another example is a company’s philosophy statement that may appear across several sub-pages.

Online shops are also often affected by instances of external duplicate content. When you purchase your products from a wholesaler, other retailers may be using the same product descriptions on their websites. In this case, identical content not only negatively affects the search engine ranking, but also the purchasing decision of potential customers. If a product presentation doesn’t differ between dealers, it’s the price that decides. It’s best to use unique product descriptions and regularly check whether other retailers have copied them.

External duplicate content is also created when you copy third-party content even as part of a cooperation and with the consent of the author. Different country and language versions of your website across different domain names also pose a risk of external duplicate content.

rankingCoach

Boost sales with AI-powered online marketing

Improve your Google ranking without paying an agency
Reply to reviews and generate social media posts faster
No SEO or online marketing skills needed

Why is duplicate content problematic?

Search engines like Google use an algorithm to evaluate all potentially relevant websites and use certain criteria to determine the order in which the search results are displayed. The aim of this evaluation is to present the user with the most relevant content at the top of the ranking.

If the same content appears across several websites, search engines fail to make a proper evaluation. This complicates the assignment of trust, relevance, and authority and, as a result, also the creation of a ranking. Search engines therefore generally avoid indexing the same content multiple times and displaying it in the search results, as this does not offer any added value for users. Therefore, the ranking of pages that contain duplicate content will decrease.

Due to the negative effects of duplicate content, so-called web scrapers are feared. This type of software copies websites one-to-one. A search engine like Google can now recognize scraper sites based on various parameters and distinguishes them from “true” websites.

Around 25 to 35 percent of the content of all websites is duplicate content. Duplicate content isn’t necessarily always a bad thing. This video explains how Google deals with duplicate content and what to avoid in any case:

To display this video, third-party cookies are required. You can access and change your cookie settings here.

How to spot duplicate content?

It’s advisable to check your website regularly for duplicate content. In many cases, it happens without the website operator’s knowledge, for example when new pages are created or when internal links are inconsistent. Below, we’ll show you the most common methods you can use to track down duplicate content.

Manually check your website

If your website consists of a limited number of sub-pages, it’s a good idea to check them manually. Pay particular attention to text sections that you use several times across your website. Typically, these are company statements/presentations or a call-to-action.

Suspect a text module to occur more than once? In this case, a Google search can help. Enter the text in quotation marks in the search box and see if different URLs for your website appear in the search results.

Tip

Google actually hides duplicated search results to show only relevant page to users. If you restart your search, previously hidden results will be displayed as part of the ranking.

Search for duplicate content using an analysis tool

In the case of large numbers of sub-pages or in eCommerce, manual searches are time-intensive. Because many website operators face this problem, various tools are available that automatically search for duplicate content.

The Google Search Console is a free and useful analysis tool that supports search engine optimization and the search for internal duplicate content. It finds the dominant keywords under which your website is found and viewed. Then you can filter for exact inquiries. If the tool lists multiple subpages, you check them for duplicate content. In the “Index Coverage” report, under “Excluded Pages”, you can view the subpages that have been identified as duplicates and consequently excluded.

How to avoid duplicate content?

Now you know how to spot duplicate content. But it’s best to avoid it in the first place. These tips should help you:

Create sub-pages that are clearly distinguishable thematically and use different main keywords across each page. The best way to keep track of things is to work out a keyword strategy beforehand.
Avoid placing generalized paragraphs across multiple subpages.
Avoid copying pre-written texts (unless it’s a designated quotation or legal quote).
Pay attention to the consistency of internal links and avoid different entries for the same URL. These are typically created by adding index.htm to the homepage URL or through variants with or without the trailing slash (/).
Use top level domains across multiple language and country versions of your website such as "https://www.example.com" as opposed to sub-domains like "https://www.example.org/en".

Generally, you cannot influence the creation of external duplicate content if other webmasters copy your content. Therefore, it is advisable to initiate crawling manually after creating a new page. If your website is indexed with the respective content as the first page, it gets classified as the original.

Technical causes for duplicate content

Technical causes often lead to internal duplicate content being created without the knowledge of the website operator. It is advisable to check your online presence for the following points:

Multiple variants of a web address

When you switch to encrypted HTTPS (Hypertext Transfer Protocol Secure), it’s important to set up forwarding from your old web address. If your old website remains accessible via http://, 100 percent congruently duplicate content is created!

Make sure you check whether your website can be accessed using different spellings. Typical examples include:

Your homepage is accessible via index.php as well as with and without trailing slash.
Your website is accessible with and without www as part of the URL.
Your URL is case-sensitive.

If you spot two or more URLs that lead to the same subpage, set up a 301 redirect to the main page for all variants.

Other technical measures

If your website has different language or country versions, mark them with hreflang to avoid duplicate content.
Check your URL parameters. This often creates many unique URLs that lead to the same content. This is a common cause of duplicate content, especially with filter functions in online shops.
Pay attention to session IDs that form part of a URL. As a result, crawlers may receive a new ID each time a subpage is accessed and thus reach a new URL.
Exclude printer-friendly versions of websites from indexing.

How to handle duplicate content correctly

Manipulative and intentional copying of third-party content is prohibited for copyright reasons and according to Google guidelines. However, the occurrence of duplicate content online is normal and is not directly punished. Nevertheless, it’s advisable to avoid duplicate content whenever possible.

Where a website contains two or more sub-pages with similar or identical content, you can merge the content to a single page or expand the respective sub-pages by adding unique content and individual keywords to optimize their SEO content. The option that’s best for you depends on the relevance of the pages and whether there is an opportunity to strengthen keywords.

Choose stronger keywords for headings, your meta description, and the meta title. This avoids them being classified as duplicate content and improves your ranking across search engines. To modify existing texts and differentiate them, use bullet points, lists, and tables, or integrate media such as images and videos.

Note

Search engines can spot repeating elements in footers or headers and do not consider them duplicate content. Here, it’s not necessary to create different content for each sub-page.

If you want to share existing content, for example, a blog article or a press release on other sites, use this canonical tag rel="canonical" in the header. By doing so, you are declaring the selected URL as the standard resource or original URL. The tag is invisible to users and still makes it clear to search engines how the pages are linked to one another.

Tip

Duplicate content is only one aspect of search engine optimization. The rankingCoach by IONOS optimizes your website step-by-step with the help of useful video tutorials.

Was this article helpful?

What are the most important SEO terms? An overview

Search engine optimization is an important component for a successful website, because without SEO your site is less likely to be displayed by Google and other search engines. However, the many SEO terms can be confusing, especially at the beginning. We’ve put together an SEO…

SEO
Google

Website relaunch: checklist for restarting your online project

The internet is constantly changing: what was the status quo today might not be tomorrow. This means that website operators constantly face new challenges to stay on the ball. Regular website relaunches are an option to stay relevant, but they have to be well planned. We’ve…

Website Knowledge
Advice

Google ranking factors

Google is accountable for over 80% of search engine searches in the USA, making it by far the most popular search engine. It’s therefore of the utmost importance for owners of commercial internet projects to have their website ranking as highly as possible in the search results.…

SEO
SEA
Google