Latent semantic indexing was introduced
in an attempt to improve the service offered by Google and other search
engines to those using their services.
Keyword density has been the main part of search engine optimization
for many years, easily understood by beginners in website design and article
writing. The two major factors influencing search engine placement were
the use of keywords and in-links (back links).
When Google introduced Adsense, however, it soon became apparent to entrepreneurs
that there was a lot of money to be made by generating web-pages specifically
designed to display Adsense ads. Thousands of dollars could be made daily
by generating thousands of pages, using template-based page generation
software specifically designed for the purpose. Content duplication was
rife and the websites themselves were of little or no use to the visitor
who was presented with nothing more than Adsense adverts.
One of the reasons, perhaps the main one, for the introduction of latent
semantic indexing was to overcome this problem, and to ensure that websites
were providing a useful service to those using Google’s search engine,
though Google is not the only search engine using the algorithm.
Since the introduction of latent semantic indexing, many sites have been
de-listed by Google as being of little use to the visitor, and for using
duplicate content. A change of keyword was frequently the only difference
between pages. Many internet marketers found their income slashed to almost
zero overnight, and this change can often be traced back to the time that
latent semantic indexing was introduced.
Latent semantic indexing was initially used in Adsense to enable adverts
targeted to the theme of a webpage to appear on the page. The algorithm
checks the wording on the page and determines the theme of the page. It
was only later that Google applied the algorithms to search engine placement,
and it is used by search engines other than Google. It involves analysis
of words used in natural language, the synonyms and closely related words
used when discussing the general theme of a page. It complements, rather
than replaces, keyword analysis.
Since it is based on a mathematical set of rules, or algorithm, it is
not perfect and can lead to results which are justifiable mathematically,
but have no meaning in natural language. Google purchased a company, Applied
Semantics, to develop it early in 2003.
So what is latent semantic indexing? How does it work in layman’s
terms?
Let’s look at the three words separately:
LATENT.
The word ‘latent’ means something which is present, but not
obviously visible. For example, the latent heat of vaporization, or the
heat required to vaporize water, is present in clouds, and is only released
when it rains. This is why it seems to get warmer during thunderstorms.
In terms of latent semantic indexing, it means that a word such as ‘lock’,
can be present in a text, and its meaning is hidden until some other factor
reveals it. ‘Lock’ can mean, among other things, a piece of
hair, a security device or a means of conveying a barge between different
heights in a canal. It is only the rest of the text which makes its meaning
clear.
SEMANTIC
The word ‘semantic’ refers to the meaning of language or
words, as opposed to what is actually said or written. In the use of the
word ‘lock’ in ‘a lock of hair’, semantics is
the use given to the word ‘lock’, which is made obvious by
the expression ‘lock of hair’.
INDEXING
With reference to use of the word in latent semantic indexing, ’indexing’
is the identification of the meaning of a document from its subject content,
and its listing into a form suitable for use by a search system.
Let’s make it clearer by giving an example. Software-generated
pages used for Adsense tend to be very general and are able to be used
as templates for any keyword or phrase. Here is how a typical piece of
such text could read, using ‘the history of locks’ as key-phrase.
“If you are seeking information on the history of locks, there
is no better place than the internet. The information superhighway is
full of sites specializing in the history of locks, and the history of
locks is a very popular subject. We recommend that you check out the other
pages on this site for information on other topics associated with the
history of locks.”
This is very common with Adsense sites. You can replace the key-phrase
‘the history of locks’ with any keyword whatsoever, and the
same text can be used countless times, being replaced automatically by
the software from a given list of keywords. It does not give the reader
any information whatsoever. In fact, it does not even give information
as to what type of lock is being referred to. It could apply equally to
a security lock and a canal lock. This is why latent semantic indexing
was introduced to search engine ranking analysis.
Now here is the same text including some qualifying wording:
“If you are seeking information on the history of locks, there
is no better place than the internet. The information superhighway is
full of sites specializing in the history of locks of all types, from
the massive Roman door locks to sophisticated encrypted password security
systems. From long-keyed safe locks to the history of combination locks
that have given safecrackers so much trouble over the ages. The history
of locks is a very popular subject and we recommend that you check out
the other pages on this website dealing with topics such as general security,
the history of cylinder and lever locks, and padlocks of various kinds.”
The ‘latency’ referred to in the term ’latent semantic
indexing’ is the hidden meaning of the word ‘lock’,
which remains hidden in the first version until the semantics of the second
reveal its meaning. Thus, by use of the algorithm, website content with
similar keywords, but different meanings for the keywords, can be differentiated
and, more importantly for the webmaster, the relevance of the site can
be properly determined and indexed.
No longer will sites with ambiguous keywords, as in version one above,
be acceptable to search engines. The semantics of the page must make the
meaning and topic of the page clear.
So what does that mean to you? It means that not only must you maintain
a reasonable density of the specific keyword being targeted, since that
is still the term being used by the searcher, but you must also use related
words and terms to define the overall theme of the page. Prior to latent
semantic indexing, a search for the term ‘the history of locks and
canals’ would have been directed to both of the above texts. Since
its introduction, such a search will be directed to neither. It will be
directed to a page where it is obvious that the theme of the site is canal
locks.
This can only be good for the visitor to Google.
About The Author:
Peter is an industrial chemist with an interest in internet marketing
and new technology as exemplified by his website http://www.data-voip-solutions.com
and also on audiovisual formats, codecs and transmission as in his alternative
site http://www.online-free-movies.com