People say that LSI keywords have the power to boost Google rankings. Is this true, or is it yet another SEO myth?
Read almost any article about LSI keywords, and youāll be told two things:
- Google uses a technology called LSI to index web pages.
- Using LSI keywords in your content helps you rank higher on Google.
- Both of these claims are technically false.
In this guide, youāll learn why that is and what to do about it.
But first, the basicsā¦
What are LSI keywords?
LSI keywords are words and phrases that Google sees as semantically-related to a topicāat least according to many in the SEO community. If youāre talking about cars, then LSI keywords might be automobile, engine, road, tires, vehicle, and automatic transmission.
But, according to Googleās John Mueller, LSI keywords donāt exist:
“Thereās no such thing as LSI keywords ā anyone whoās telling you otherwise is mistaken, sorry”.āJohn
So whatās the deal here?
Before we answer that question, we first need to understand a bit more about LSI itself.
What is Latent Semantic Indexing (LSI)?
Latent Semantic Indexing (LSI), or Latent Semantic Analysis (LSA), is a natural-language processing technique developed in the 1980s.
Unfortunately, unless youāre familiar with mathematical concepts like eigenvalues, vectors, and single value decomposition, the technology itself isnāt that easy to understand.
For that reason, we wonāt be tackling how LSI works.
Instead, weāll focus on the problem it was created to solve.
Hereās how the creators of LSI define this problem:
The words a searcher uses are often not the same as those by which the information sought has been indexed.
But what does this actually mean?
Say that you want to know when summer ends and fall begins. Your WiFi is down, so you go old school and grab an encyclopedia. Instead of randomly flicking through thousands of pages, you lookup āfallā in the index and flick to the right page.
Clearly, thatās not the type of fall you wanted to learn about.
Not one to be defeated that easily, you flick back and realize that what youāre looking for is indexed under āautumnāāanother name for fall.
The problem here is that āfallā is a synonym and polysemic word.
What are synonyms?
Synonyms are words or phrases that mean the same or nearly the same thing as another word or phrase.
Examples include rich and wealthy, fall and autumn, and cars and automobiles.
Hereās why synonyms are problematic, according to the LSI patent:
[ā¦] there is a tremendous diversity in the words people use to describe the same object or concept; this is called synonymy. Users in different contexts, or with different needs, knowledge or linguistic habits will describe the same information using different terms. For example, it has been demonstrated that any two people choose the same main keyword for a single, well-known object less than 20% of the time on average.
But how does this relate to search engines?
Imagine that we have two web pages about cars. Both are identical, but one substitutes all instances of the word cars for automobiles.
If we were to use a primitive search engine that only indexes the words and phrases on the page, it would only return one of these pages for the query ācars.ā
This is bad because both results are relevant; itās just that one describes what weāre looking for in a different way. The page that uses the word automobile instead of cars might even be the better result.
Bottom line: search engines need to understand synonyms to return the best results.
What are polysemic words?
Polysemic words and phrases are those with multiple different meanings.
Examples include mouse (rodent / computer), bank (financial institute / riverbank), and bright (light / intelligent).
Hereās why these cause problems, according to the creators of LSI:
In different contexts or when used by different people the same word takes on varying referential significance (e.g., ābankā in river bank versus ābankā in a savings bank). Thus the use of a term in a search query does not necessarily mean that a text object containing or labeled by the same term is of interest.
These words present search engines with a similar problem to synonyms.
For example, say that we search for āapple computer.ā Our primitive search engine might return both of these pages, even though one is clearly not what weāre looking for.
Bottom line: search engines that donāt understand the different meanings of polysemic words are likely to return irrelevant results.
How does LSI work?
Computers are dumb.
They donāt have the inherent understanding of word relationships that we humans do.
For example, everyone knows that big and large mean the same thing. And everyone knows that John Lennon was in The Beatles.
But a computer doesnāt have this knowledge without being told.
The problem is that thereās no way to tell a computer everything. It would just take too much time and effort.
LSI solves this problem by using complex mathematical formulas to derive the relationships between words and phrases from a set of documents.
In simple terms, if we run LSA on a set of documents about seasons, the computer can likely figure out a few things:
- First, the word fall is synonymous with autumn:
- Second, words like season, summer, winter, fall, and spring are all semantically related:
- Third, fall is semantically-related to two different sets of words:
Search engines can then use this information to go beyond exact-query matching and deliver more relevant search results.
Does Google use LSI?
Given the problems LSI solves, itās easy to see why people assume Google uses LSI technology. After all, itās clear that matching exact queries is an unreliable way for search engines to return relevant documents.
Plus, we see evidence every day that Google understands synonymy:
And polysemy:
But despite this, Google almost certainly doesnāt use LSI technology.
How do we know? Google representatives say so.
Donāt believe them?
Here are three more pieces of evidence to back up this fact:
1. LSI is old technology.
LSI was invented in the 1980s before the creation of the World Wide Web. As such, it was never intended to be applied to such a large set of documents.
Thatās why Google has since developed better, more scalable technology to solve the same problems.
Bill Slawski puts it best:
LSI technology wasnāt created for anything the size of the Web [ā¦] Google has developed a word vector approach (used for Rankbrain) which is much more modern, scales much better, and works on the Web. Using LSI when you have Word2vec available would be like racing a Ferrari with a go-cart.
2. LSI was created to index known document collections.
The World Wide Web is not only large but also dynamic.
This means that the billions of pages in Googleās index change regularly.
Thatās a problem because the LSI patent tells us that the analysis needs to run āeach time there is a significant update in the storage files.ā
That would take a lot of processing power.
3. LSI is a patented technology.
The Latent Semantic Indexing (LSI) patent was granted to Bell Communications Research, Inc. in 1989. Susan Dumais, one of the co-inventors who worked on the technology, later joined Microsoft in 1997, where she worked on search-related innovations.
That said, US patents expire after 20 years, which means that the LSI patent expired in 2008.
Given that Google was pretty good at understanding language and returning relevant results much earlier than 2008, this is yet another piece of evidence to suggest that Google doesnāt use LSI.
Once again, Bill Slawski puts it best:
Google does attempt to index synonyms and other meanings for words. But it isnāt using LSI technology to do that. Calling it LSI is misleading people. Google has been offering synonym substitutions and query refinements based upon synonyms since at least 2003, but that doesnāt mean that they are using LSI. It would be like saying that you are using a smart telegraph device to connect to the mobile web.
Can mentioning related words, phrases, and entities boost rankings?
Most SEOs see āLSI keywordsā as nothing more than related words, phrases, and entities.
If we roll with that definitionādespite it being technically inaccurateāthen yes, using some related words and phrases in your content can almost certainly help improve SEO.
How do we know? Google indirectly tells us so here:
Just think: when you search for ādogsā, you probably donāt want a page with the word ādogsā on it hundreds of times. With that in mind, algorithms assess if a page contains other relevant content beyond the keyword ādogsā ā such as pictures of dogs, videos or even a list of breeds.
On a page about dogs, Google sees names of individual breeds as semantically related.
But why do these help pages to rank for relevant terms?
Simple: Because they help Google understand the overall topic of the page.
For example, here are two pages that each mention the word ādogsā the same number of times:
Looking at other important words and phrases on each page tells us that only the first is about dogs. The second is mostly about cats.
Google uses this information to rank relevant pages for relevant queries.
How to find and use related words and phrases?
If youāre knowledgeable about a topic, youāll naturally include related words and phrases in your content.
For example, it would be difficult to write about the best video games without mentioning words and phrases like āPS4 games,ā āCall of Duty,ā and āFallout.ā
But itās easy to miss important onesāespecially with more complex topics.
For instance, our guide to nofollow links fails to mention anything about the sponsored and UGC link attributes:
Google likely sees these as important, semantically-related terms that any good article about the topic should mention.
That may be part of the reason why articles that talk about these things outrank us.
With this in mind, here are nine ways to find potentially related words, phrases, and entities:
1. Use common sense.
Check your pages to see if youāve missed any obvious points.
For example, if the page is a biographical article about Donald Trump and doesnāt mention his impeachment, itās probably worth adding a section about that.
In doing so, youāll naturally mention related words, phrases, and entities like āMueller Report,ā āNancy Pelosi,ā and āwhistleblower.ā
Just remember that thereās no way to know for sure whether Google sees these words and phrases as semantically-related. However, as Google aims to understand the relationships between words and entities that we humans inherently understand, thereās something to be said for using common sense.
2. Look at autocomplete results.
Autocomplete results donāt always show important related keywords, but they can give clues about ones that might be worth mentioning.
For example, we see āDonald trump spouse,ā āDonald trump age,ā and āDonald trump twitterā as autocomplete results for āDonald trump.ā
These arenāt related keywords in themselves, but the people and things theyāre referring to might be. In this case, those are Melania Trump, 73 years old, and @realDonaldTrump.
Probably all things that should be mentioned in a biographical article, right?
3. Look at related searches.
Related searches appear at the bottom of the search results.
Like autocomplete results, they can give clues about potentially related words, phrases, and entities worth mentioning.
For example, āDonald trump educationā is referring to The Wharton School of the University of Pennsylvania that he attended.
4. Use an āLSI keywordā tool.
Popular āLSI keywordā generators have nothing to do with LSI. However, they do occasionally kick back some useful ideas.
For example, if we plug ādonald trumpā into a popular tool, it pulls related people (entities) like his spouse, Melania Trump, and son, Barron Trump.
5. Look at other keywords the top pages rank for.
Use the āAlso rank forā keyword ideas report in Ahrefsā Keywords Explorer to find potentially related words, phrases, and entities.
If there are too many to handle, try running a Content Gap analysis using three of the top-ranking pages, then set the number of intersections to ā3.ā
This shows keywords that all of the pages rank for, which often gives you a more refined list of related words and phrases.
6. Run a TF*IDF analysis.
TF-IDF has nothing to do with latent-semantic indexing (LSI) or latent-semantic analysis (LSA), but it can occasionally help uncover āmissingā words, phrases, and entities.
7. Look at knowledge bases.
Knowledge bases like Wikidata.org and Wikipedia are fantastic sources of related terms.
Google also pulls knowledge graph data from these two knowledge bases.
8. Reverse-engineer the knowledge graph.
Google stores the relationships between lots of people, things and concepts in something called a knowledge graph. Results from the knowledge graph often show up in Google search results.
Try searching for your keyword and see if any data from the knowledge graph shows up.
Because these are entities and data points that Google associates with the topic, itās definitely worth talking about relevant ones where it makes sense.
9. Use Googleās API to find entities
Paste the text from a top-ranking page into Googleās Natural Language API demo. Look for relevant and potentially important entities that you might have missed.
Final thoughts
LSI keywords donāt exist, but semantically-related words, phrases, and entities do, and they have the power to boost rankings.
Just make sure to use them where it makes sense, and not to haphazardly sprinkle them whenever and wherever.
In some cases, this may mean adding new sections to your page.
For instance, if you want to add words and entities like āimpeachmentā and āHouse Intelligence Committeeā to an article about Donald Trump, thatās probably going to require a couple of new paragraphs under a new subheading.