What Is TF IDF and How It Can Improve SEO?

What is TF IDF?

TF IDF comes from the English language. TF stands for term frequency. IDF stands for inverse document frequency – the inverse frequency of a word in documents. It is a method of calculating the weight of words based on the number of times they occur.

Let’s simplify:

TF IDF informs about the frequency of occurrence of the word and its importance in the context of all examined documents (e.g. websites).

TF IDF informs about the frequency of occurrence of a word and its importance in the context of all examined documents (e.g. set of websites).

This algorithm can be used as a method of content quality assessment by search engines.

Fact: TF IDF is used in anti-plagiarism systems.

Formula and sample calculations

Simplified formula

TF IDF = TF * IDF

where:

TF – is the number of occurrences of the word or phrase in the document (e.g. on a webpage), e.g. if the word “fox” appears 10 times in the article and the entire article has 500 words, TF is 10/500 = 0.02.

IDF – IDF is the logarithm of the number of documents in the corpus divided by the number of documents in the corpus that contain the studied keyword.

The corpus is a collection of all examined documents, e.g. if we examine pages in the top 10, our corpus is 10.

Calculation example for IDF. If the corpus is 10, and the tested keyword appears, for example, in three documents in the corpus, then IDF is log (10/3) = 0.52.

Having calculated separately TF and IDF, you can proceed to calculate the whole TF IDF, just multiply TF times IDF. In our example, it will be 0.02 * 0.52 = 0.01.

Fact: Popular words that often appear in articles are called “stopwords”. Example: If a word from the “stopwords” group, e.g. “to” appears in 10 documents in the corpus consisting of 10 documents, it is easy to calculate the IDF, i.e. log (10/10) = 0. Thanks to this, the most common words of the “stopwords” type have zero or very low weight calculated by the TF IDF algorithm.

Why TF IDF helps SEO?

Currently, Google in its search results awards websites that have high-quality content, i.e. article describe a topic in-depth and in-detail.

A simple example:

If the article uses all closely related keywords and is of the right length, it is an important signal to the search engine that it covers the topic in a complete manner.

The question is how to discover words that Google considers relevant and related to the topic of the article. To discover words that are related to a given topic, you can use the TF IDF algorithm and analyze pages that are already in the top 10 search results for a given keyword or phrase. Thanks to this, you will discover words that are important to include in the content.

How to do TF IDF analysis for websites from top 10 search results?

There is no point in making calculations manually, because there is a lot of data.

To automate the calculation process and make the analysis as easy as possible, it is best to use the TF IDF tool. It is very easy to use. You only need to enter the addresses of the top 10 pages in Google for the query for which you plan to write an article. The tool will analyze all pages and will indicate sorted words and phrases according to the highest TF IDF weights.

Important point:

But what if your article is already written and published and not optimized for TF IDF? In the TF IDF tool in addition to the 10 websites of your competitors, also enter your website address. This will give you information on exactly what words are missing on your webpage. You can paste the ready list of words into the “related keywords” field in the Content Editor tool – this tool will check if all the necessary keywords are already included in your optimized and improved article. This will greatly speed up the article improvement process.

Summary

  • TF IDF has a practical and very important usage in the SEO and it is especially helpful at optimizing and creating high-quality content.
  • It’s a good idea to do a TF IDF analysis for competitors’ sites from Google’s top 10 organic search results before writing an article. Thanks to this, you will discover the words that should be included in the article.
  • The TF IDF analysis should be performed especially for already published articles that do not appear in the top positions. Thanks to this, you can discover missing words that should be added to the content so that the article will be of higher quality and better describes the topic.
  • Without TF IDF optimization, you run the risk of losing position in the future due to the possibility of your content being classified as low quality (content that not fully covers the topic).

Let me know in the comment what improvements did you get after using the TF IDF tool?