Latent Semantic Indexing: What Is It?

By Peter Nisbet

Latent semantic indexing was introduced in an attempt to improve the service offered by Google and other search engines to those using their services.

Keyword density has been the main part of search engine optimization for many years, easily understood by beginners in website design and article writing. The two major factors influencing search engine placement were the use of keywords and in-links (back links).

When Google introduced Adsense, however, it soon became apparent to entrepreneurs that there was a lot of money to be made by generating web-pages specifically designed to display Adsense ads. Thousands of dollars could be made daily by generating thousands of pages, using template-based page generation software specifically designed for the purpose. Content duplication was rife and the websites themselves were of little or no use to the visitor who was presented with nothing more than Adsense adverts.

One of the reasons, perhaps the main one, for the introduction of latent semantic indexing was to overcome this problem, and to ensure that websites were providing a useful service to those using Google’s search engine, though Google is not the only search engine using the algorithm.

Since the introduction of latent semantic indexing, many sites have been de-listed by Google as being of little use to the visitor, and for using duplicate content. A change of keyword was frequently the only difference between pages. Many internet marketers found their income slashed to almost zero overnight, and this change can often be traced back to the time that latent semantic indexing was introduced.

Latent semantic indexing was initially used in Adsense to enable adverts targeted to the theme of a webpage to appear on the page. The algorithm checks the wording on the page and determines the theme of the page. It was only later that Google applied the algorithms to search engine placement, and it is used by search engines other than Google. It involves analysis of words used in natural language, the synonyms and closely related words used when discussing the general theme of a page. It complements, rather than replaces, keyword analysis.

Since it is based on a mathematical set of rules, or algorithm, it is not perfect and can lead to results which are justifiable mathematically, but have no meaning in natural language. Google purchased a company, Applied Semantics, to develop it early in 2003.

So what is latent semantic indexing? How does it work in layman’s terms?

Let’s look at the three words separately:


The word ‘latent’ means something which is present, but not obviously visible. For example, the latent heat of vaporization, or the heat required to vaporize water, is present in clouds, and is only released when it rains. This is why it seems to get warmer during thunderstorms. In terms of latent semantic indexing, it means that a word such as ‘lock’, can be present in a text, and its meaning is hidden until some other factor reveals it. ‘Lock’ can mean, among other things, a piece of hair, a security device or a means of conveying a barge between different heights in a canal. It is only the rest of the text which makes its meaning clear.


The word ‘semantic’ refers to the meaning of language or words, as opposed to what is actually said or written. In the use of the word ‘lock’ in ‘a lock of hair’, semantics is the use given to the word ‘lock’, which is made obvious by the expression ‘lock of hair’.


With reference to use of the word in latent semantic indexing, ’indexing’ is the identification of the meaning of a document from its subject content, and its listing into a form suitable for use by a search system.

Let’s make it clearer by giving an example. Software-generated pages used for Adsense tend to be very general and are able to be used as templates for any keyword or phrase. Here is how a typical piece of such text could read, using ‘the history of locks’ as key-phrase.

“If you are seeking information on the history of locks, there is no better place than the internet. The information superhighway is full of sites specializing in the history of locks, and the history of locks is a very popular subject. We recommend that you check out the other pages on this site for information on other topics associated with the history of locks.”

This is very common with Adsense sites. You can replace the key-phrase ‘the history of locks’ with any keyword whatsoever, and the same text can be used countless times, being replaced automatically by the software from a given list of keywords. It does not give the reader any information whatsoever. In fact, it does not even give information as to what type of lock is being referred to. It could apply equally to a security lock and a canal lock. This is why latent semantic indexing was introduced to search engine ranking analysis.

Now here is the same text including some qualifying wording:

“If you are seeking information on the history of locks, there is no better place than the internet. The information superhighway is full of sites specializing in the history of locks of all types, from the massive Roman door locks to sophisticated encrypted password security systems. From long-keyed safe locks to the history of combination locks that have given safecrackers so much trouble over the ages. The history of locks is a very popular subject and we recommend that you check out the other pages on this website dealing with topics such as general security, the history of cylinder and lever locks, and padlocks of various kinds.”

The ‘latency’ referred to in the term ’latent semantic indexing’ is the hidden meaning of the word ‘lock’, which remains hidden in the first version until the semantics of the second reveal its meaning. Thus, by use of the algorithm, website content with similar keywords, but different meanings for the keywords, can be differentiated and, more importantly for the webmaster, the relevance of the site can be properly determined and indexed.

No longer will sites with ambiguous keywords, as in version one above, be acceptable to search engines. The semantics of the page must make the meaning and topic of the page clear.

So what does that mean to you? It means that not only must you maintain a reasonable density of the specific keyword being targeted, since that is still the term being used by the searcher, but you must also use related words and terms to define the overall theme of the page. Prior to latent semantic indexing, a search for the term ‘the history of locks and canals’ would have been directed to both of the above texts. Since its introduction, such a search will be directed to neither. It will be directed to a page where it is obvious that the theme of the site is canal locks.

This can only be good for the visitor to Google.

About The Author:

Peter is an industrial chemist with an interest in internet marketing and new technology



