|
Search engine optimization (SEO) is a set of methods aimed at improving
the
ranking of a website in search engine listings, and could be
considered a subset of search engine marketing. The term SEO (Search
Engine Optimizers) also refers to an industry of consultants who carry
out optimization projects on behalf of clients' sites. Some
commentators, and even some SEOs, break down methods used by
practitioners into categories such as "white hat SEO" (methods
generally approved by search engines, such as building content and
improving site quality), or "black hat SEO" (tricks such as
cloaking and spamdexing). White hatters charge that black hat methods
are an attempt to manipulate search rankings unfairly. Black hatters
counter that all SEO is an attempt to manipulate rankings, and that the
particular methods one uses to rank well are irrelevant.
Search engines display different kinds of listings in the search
engine results pages (SERPs), including: pay-per-click advertisements,
paid inclusion listings, and organic search results. SEO is primarily
concerned with advancing the goals of a web site by improving the number
and position of its organic search results for a wide variety of
relevant keywords. SEO strategies can increase both the number and
quality of visitors, where quality means visitors who complete the
action hoped for by the site owner (e.g. purchase, sign up, learn
something). Search engine optimization is sometimes offered as a
stand-alone service, or as a part of a larger marketing effort, and can
often be very effective when incorporated into the initial development
and design of a site.
For competitive, high-volume search terms, the cost of pay per click
advertising can be substantial. Ranking well in the organic search
results can provide the same targeted traffic at a potentially
significant lower cost. Site owners may choose to optimize their sites
for organic search, if the cost of optimization is less than the cost of
advertising.
Not all sites have identical goals for search optimization. Some
sites are seeking any and all traffic, and may be optimized to rank
highly for common search phrases. A broad search optimization strategy
can work for a site that has broad interest, such as a periodical, a
directory, or site that displays advertising with a CPM revenue model.
In contrast, many businesses try to optimize their sites for large
numbers of highly specific keywords that indicate readiness to buy.
Overly broad search optimization can hinder marketing strategy by
generating a large volume of low-quality inquiries that cost money to
handle, yet result in little business. Focusing on desirable traffic
generates better quality sales leads, allowing the sales force to close
more business. Search engine optimization can be very effective when
used as part of a smart niche marketing strategy.
History
Early search engines
Webmasters and content providers began optimizing sites for search
engines in the mid-1990s, as the first search-engines were cataloging
the early Web. Initially, all a webmaster needed to do was submit a site
to the various engines which would run spiders, programs to
"crawl" the site, and store the collected data. The default
search-bracket was to scan an entire webpage for so-called related
search-words, so a page with many different words matched more searches,
and a webpage containing a dictionary-type listing would match almost
all searches, limited only by unique names. The search-engines then
sorted the information by topic, and served results based on pages they
had spidered. As the number of documents online kept growing, and more
webmasters realized the value of organic search listings, so popular
search engines began to sort their listings so they could display the
most relevant pages first. This was the start of a friction between
search engine and webmaster that continues to this day.
At first search-engines were guided by the webmasters themselves.
Early versions of search algorithms relied on webmaster-provided
information such as category and keyword meta tags. Meta-tags provided a
guide to each page's content. When some webmasters began to abuse
meta-tags, causing their pages to rank for irrelevant searches, search
engines abandoned their consideration of meta-tags and instead developed
more complex ranking algorithms, taking into account factors that
elevated a limited number of words (anti-dictionary) and were more
diverse, including:
- Text within the title tag
- Domain name
- URL directories and file names
- HTML tags: headings, bold and emphasized text
- Keyword density
- Keyword proximity
- Alt attributes for images
- Text within NOFRAMES tags
But, relying so extensively on factors that were still within the
webmasters' exclusive control, search-engines continued to suffer from
abuse and ranking manipulation. In order to provide better results to
their users, search-engines had to adapt to ensure their SERPs showed
the most relevant search results, rather than useless pages stuffed with
numerous keywords by unscrupulous webmasters, using a bait-and-switch
lure to display unrelated webpages. This led to the rise of a new kind
of search engine.
Organic search engines
Google was started by two PhD students at Stanford University, Sergey
Brin and Larry Page, and brought a new concept to evaluating web pages.
This concept, called PageRank, has been from the start important to the
Google algorithm. PageRank relies heavily on incoming links and uses
the logic that each link to a page is a vote for that page's value. The
more incoming links a page had the more "worthy" it is. The
value of each incoming link itself varies directly based on the PageRank
of the page it comes from and inversely on the number of outgoing links
on that page.
With help from PageRank, Google proved to be very good at serving
relevant results. Google became the most popular and successful
search-engine. Because PageRank measured an off-site factor, Google felt
it would be more difficult to manipulate than on-page factors.
But, manipulated it was. Webmasters had already developed
link-manipulation tools and schemes to influence the Inktomi
search-engine. These methods proved to be equally applicable to Google's
algorithm. Many sites focused on exchanging, buying, and selling links
on a massive scale. PageRank's reliance on the link as a vote of
confidence in a page's value was undermined as many webmasters sought to
garner links purely to influence Google into sending them more traffic,
irrespective of whether the link was useful to human site visitors.
Plus, the default search-bracket was still to scan an entire webpage
for so-called related search-words, and a webpage containing a
dictionary-type listing would still match almost all searches (except
special names) at an even higher priority given by link-rank. So, it
also became a battle of high-ranking dictionary-pages.
It was time for Google -- and other search engines -- to look at a
wider range of off-site factors. There were other reasons to develop
more intelligent algorithms. The Internet was reaching a vast population
of non-technical users who were often unable to use advanced querying
techniques to reach the information they were seeking and the sheer
volume and complexity of the indexed data was vastly different from that
of the early days. Search engines had to develop predictive, semantic,
linguistic and heuristic algorithms. Around the same time as the work
that led to Google, IBM had begun work on the Clever Project, and Jon
Kleinberg was developing the HITS algorithm.
A proxy for the PageRank metric is still displayed in the Google
Toolbar, but PageRank is only one of more than 100 factors that Google
considers in ranking pages.
Today, most search-engines keep their methods and ranking algorithms
secret, to compete for finding the most valuable search-results and to
deter spampages from clogging those results. A search-engine may use
hundreds of factors in ranking the listings on its SERPs; the factors
themselves and the weight each carries may change continually.
Algorithms can differ widely: a webpage that ranks #1 in a particular
search-engine could rank #200 in another search-engine.
Much current SEO thinking on what works and what doesn't is largely
speculation and informed guesses. Some SEOs have carried out controlled
experiments to gauge the effects of different approaches to
search-optimization.
The following factors are speculation on some of the considerations
search-engines may presently be using or which could be built into their
algorithms. A number of these are taken from one of Google's patent
applications, and may give some indication as to what is in the
pipeline. Some are pure speculation. It's also good to keep in mind that
Google has over 180 patents and patent applications assigned to them at
the US Patent and Trademark Office (USPTO), and a number of those
include possible insights into other factors, and other directions that
the search engine may follow, some of which may not be consistent with
this list.
- Age of site
- Length of time domain has been registered
- Age of content
- Frequency of content: regularity with which new content is added
- Text size: number of words above 200-250 (not affecting Google in
2005)
- Age of link and reputation of linking site
- Standard on-site factors
- Negative scoring for on-site factors (for example, a dampening for
websites with extensive keyword meta-tags indicative of having been
optimized [^SEO-ed])
- Uniqueness of content
- Related terms used in content (the terms the search-engine
associates as being related to the main content of the page)
- Google Pagerank (Only used in Google's algorithm)
- External links, the anchor text in those external links and in the
sites/pages containing those links
- Citations and research sources (indicating the content is of
research quality)
- Stem-related terms in the search engine's database
(finance/financing)
- Incoming backlinks and anchor text of incoming backlinks
- Negative scoring for some incoming backlinks (perhaps those coming
from low value pages, reciprocated backlinks, etc.)
- Rate of acquisition of backlinks: too many too fast could indicate
"unnatural" link buying activity
- Text surrounding outward links and incoming backlinks. A link
following the words "Sponsored Links" could be ignored
- Use of "rel=nofollow" to suggest that the search engine
should ignore the link
- Depth of document in site
- Metrics collected from other sources, such as monitoring how
frequently users hit the back button when SERPs send them to a
particular page
- Metrics collected from sources like the Google Toolbar, Google
AdWords/Adsense programs, etc.
- Metrics collected in data-sharing arrangements with third parties
(like providers of statistical programs used to monitor site
traffic)
- Rate of removal of incoming links to the site
- Use of sub-domains, use of keywords in sub-domains and volume of
content on sub-domains… and negative scoring for such activity
- Semantic connections of hosted documents
- Rate of document addition or change
- IP of hosting service and the number/quality of other sites hosted
on that IP
- Other affiliations of linking site with the linked site (do they
share an IP? have a common postal address on the "contact
us" page?)
- Technical matters like use of 301 to redirect moved pages, showing
a 404 server header rather than a 200 server header for pages that
don't exist, proper use of robots.txt
- Hosting uptime
- Whether the site serves different content to different categories
of users (cloaking)
- Broken outgoing links not rectified promptly
- Unsafe or illegal content
- Quality of HTML coding, presence of coding errors
- Actual click through rates observed by the search engines for
listings displayed on their SERPs
- Hand ranking by humans of the most frequently accessed SERPs
The relationship between SEO and the search engines
The first mentions of Search-Engine Optimization don't appear on
Usenet until 1997, a few years after the launch of the first Internet
search-engines. But, the operators of search-engines recognized quickly
that some people from the
webmaster community were making efforts to
rank well in their search-engines, and even manipulating the page
rankings in search-results. In some early search-engines, such as
Infoseek, ranking #1 was as easy as grabbing the source code of the
top-ranked page, placing it on your website, and submitting a URL to
instantly index and rank that page.
Due to the high value and targeting of search-results, there is an
adversarial relationship between search-engines and SEOs. In 2005, an
annual conference named AirWeb was created to discuss bridging the gap
and minimizing the sometimes damaging effects of aggressive web-content
providers.
Some more aggressive site-owners and SEOs generate automated sites or
employ techniques which eventually get domains banned from the
search-engines. Many search-engine optimization companies, which sell
services, employ long-term low-risk strategies, and most SEO firms that
do employ high-risk strategies do so on their own affiliate,
lead-generation, or content sites, instead of risking client-websites.
Some SEO companies employ aggressive techniques that get their client
websites banned from the search-results. The Wall Street Journal
profiled a company which allegedly used high-risk techniques and failed
to disclose those risks to its clients. Wired reported the same
company sued a blogger for mentioning that they were banned. Google's
Matt Cutts later confirmed that Google did in fact ban Traffic Power and
some of its clients.
Google has enforced webpage restrictions for years, such as for
hidden-text (background & foreground colors exactly the same hue);
in 2006, Google could punish a non-standard website by blocking
search-results, automatically, the next day for 30-35 days (or longer),
pending a reinstatement request, and if reinstated, revert the index to
old/expired/deleted webpages from a year earlier, delaying the
re-indexing of the current website for a total of 2-4 months.
Yahoo & MSN-Search do not automatically punish entire websites
for small amounts of accidental hidden-text. Not surprisingly, Google's
market-share of daily searches has fallen rapidly from 75% to 56% over
the past few years, as other search-engines find many valuable webpages
that Google has banned & cannot display: due to Google's severely
limited index. In early 2006, MSN-Search typically re-indexed small
websites every 14 days, and Yahoo also re-indexed quickly, much faster
than Google, but all 3 MSN/Yahoo/Google could require more than a month
to index a new page (new file name) on an old website.
Some search-engines have also reached out to the SEO industry, and
are frequent sponsors and guests at SEO conferences and seminars. In
fact, with the advent of paid inclusion, some search-engines now have a
vested interest in the health of the optimization community. All of the
main search-engines provide information/guidelines to help with site
optimization: Google's, Yahoo's, and MSN's. Google has a Sitemaps
program to help webmasters learn if Google is having any problems
indexing their website and also provides an invaluable amount of data on
Google traffic to your website. Yahoo! has SiteExplorer that provides a
way to submit your URLs for free (like MSN/Google), determine how many
pages are in the Yahoo index and drill down on inlinks to deep pages.
Yahoo! has an Ambassador Program and Google has a program for qualifying
Google Advertising Professionals.
Getting into search engines' listings
New sites do not need to be "submitted" to search engines
to be listed. A simple link from an established site will get the search
engines to visit the new site and begin to spider its contents. It can
take a few days or even weeks from the acquisition of a link from such
an established site for all the main search engine spiders to commence
visiting and indexing the new site.
Once the search engine has found the new site, it will generally
visit and start to index the pages on the site, as long as all the pages
are linked to with standard <a href> hyperlinks. Pages which are
accessible only through Flash or Javascript links may not be findable by
the spiders.
Search engine crawlers may look at a number of different factors when
crawling a site, and many pages from a site may not be indexed by the
search engines until they gain more pagerank or links or traffic.
Distance of pages from the root directory of a site may also be a factor
in whether or not pages get crawled, as well as other importance
metrics. Cho et al. (Cho et al., 1998) described some standards for
those decisons as to which pages are visited and sent by a crawler to be
included in a search engine's index.
Webmasters can instruct spiders to not index certain files or
directories through the standard robots.txt file in the root directory
of the domain. Standard practice requires a search engine to check this
file upon visiting the domain, though a search engine crawler will keep
a cached copy of this file as it visits the pages of a site, and may not
update that copy as quickly as a webmaster does. The web developer can
use this feature to prevent pages such as shopping carts or other
dynamic, user-specific content from appearing in search engine results,
as well as keeping spiders from endless loops and other spider traps.
For those search engines who have their own paid submission (like
Yahoo), it may save some time to pay a nominal fee for submission,
though Yahoo's paid submission program does not guarantee inclusion in
their search results.
White hat methods
White hat methods of SEO involve following the search engines'
guidelines as to what is and what isn't acceptable. Their advice
generally is to create content for the user, not the search engines; to
make that content easily accessible to their spiders; and to not try to
game their system. Often webmasters make critical mistakes when
designing or setting up their web sites, inadvertently
"poisoning" them so that they will not rank well. White hat
SEOs attempt to discover and correct mistakes, such as
machine-unreadable menus, broken links, temporary redirects, or a poor
navigation structure.
Because search engines are text-centric, many of the same methods
that are useful for web accessibility are also advantageous for SEO.
Methods are available for optimizing graphical content, including ALT
attributes, and adding a text caption. Even Flash animations can be
optimized by designing the page to include alternative content in case
the visitor cannot read Flash.
Some methods considered proper by the search engines:
- Using a short, unique, and relevant title to name each page.
- Editing web pages to replace vague wording with specific
terminology relevant to the subject of the page, and that the
audiences that the site was developed for will expect to see on the
pages, and will search with to find the site.
- Increasing the amount of original content on a site.
- Using a reasonably-sized, accurate description meta tag without
excessive use of keywords, exclamation marks or off topic terms.
- Ensuring that all pages are accessible via regular links, and not
only via Java, Javascript or Macromedia Flash applications; this can
be done through the use of a page listing all the contents of the
site (a site map)
- Developing links via natural methods: Google doesn't elaborate on
this somewhat vague guideline. Dropping an email to a fellow
webmaster telling him about a great article you've just posted, and
requesting a link, is most likely acceptable.
- Participating in a web ring with other web sites as long as the
other websites are independent, share the same topic, and are of
comparable quality.
Black hat methods
Spamdexing is the promotion of irrelevant, chiefly commercial, pages
through deceptive techniques and the abuse of the search algorithms.
Many search engine administrators consider any form of search engine
optimization used to improve a website's page rank as spamdexing.
However, over time a widespread consensus has developed in the industry
as to what are and are not acceptable means of boosting one's search
engine placement and resultant traffic.
As search engines operate in a highly automated way it is often
possible for webmasters to use methods and tactics not approved by
search engines to gain better ranking. These methods often go unnoticed
unless an employee from the search engine manually visits the site and
notices the activity, or a change in ranking algorithm causes the site
to lose the advantage thus gained. Sometimes a company will employ an
SEO consultant to evaluate competitor's sites, and report
"unethical" optimization methods to the search engines.
Spamdexing often gets confused with legitimate search engine
optimization techniques, which do not involve deceit. Spamming involves
getting web sites more exposure than they deserve for their keywords,
leading to unsatisfactory search results. Optimization involves getting
web sites the rank they deserve on the most targeted keywords, leading
to satisfactory search experiences.
When discovered, search engines may take action against those found
to be using unethical SEO methods. In February 2006, Google removed both
BMW Germany and Ricoh Germany for use of these practices.
SEO and Marketing
While this article leans towards setting up a distinction between
Search Engine Optimizers as wearing one colored hat or another, that
portrayal of the industry is really of little concern to many within the
industry, who instead see their efforts as part of a larger, holistic
effort.
There is a considerable sized body of practitioners of SEO who see
search engines as just another visitor to a site, and try to make the
site as accessible to those visitors as to any other who would come to
the pages. The focus of their work isn't primarily to rank the highest
for certain terms in search engines, but rather to help site owners
fullfill the business objectives of their sites. This may come in the
form of driving organic search results to pages, but it also may involve
the use of paid advertising on search engines and other pages, building
high quality web pages to engage and persuade, addressing technical
issues that may keep search engines from crawling and indexing those
sites, setting up analytics programs to enable site owners to measure
their successes, and making sites accessible and usable.
These SEOs may work in-house for an organization, or as consultants,
and search engine optimization may be only part of their daily
functions. Often their education of how search engines function come
from interacting and discussing the topics on forums, through blogs, at
popular conferences and seminars, and by experimentation on their own
sites. There are few college courses that cover online marketing from an
ecommerce perspective that can keep up with the changes that the web
sees on a daily basis.
While reviewing and working towards meeting the guidelines that are
posted by search engines can help one build a solid foundation for
success on the web, the results of following those guidelines is really
just the start. Many see search engine marketing as a larger umbrella
under which search engine optimization fits, but it's possible that many
who focused primarily on SEO in the past are incorporating more and more
marketing ideas into their efforts, recognizing that search engines
themselves have expanded their coverage into including RSS feeds, video
search, local results, mapping, and more into what they have to offer.
Legal issues
In 2002, search engine manipulator SearchKing filed suit in an
Oklahoma court against the search engine Google. SearchKing's claim was
that Google's tactics to prevent spamdexing constituted an unfair
business practice. This may be compared to lawsuits which email spammers
have filed against spam-fighters, as in various cases against MAPS and
other DNSBLs. In January of 2003, the court pronounced a summary
judgment in Google's favor.
Page Quality and Ranking
A webmaster who wants to maximize the value of a web site can read
the guidelines published by the search engines, as well as the coding
guidelines published by the World Wide Web Consortium. If the guidelines
are followed, and the site presents frequently updated, useful, original
content, and a few meaningful, useful inbound links are established, it
may be possible to obtain a significant amount of organic search
traffic.
When a site has useful and engaging content, there's a good chance
that other webmasters will naturally place links to the site, increasing
its PageRank and flow of visitors. When visitors discover a useful web
site, they tend to refer other visitors by tagging or bookmarking the
page, linking to it, and sending others links to it by email or instant
message.
As a result, SEO practices that improve web site quality are likely
to outlive short term practices that simply seek to manipulate search
rankings. The top SEOs recommend targeting the same thing that search
engines seek to promote: relevant, useful content for their users.
Related Links
References
-
Company
Overview. Google. URL accessed on May
26, 2005.
-
Editorial
Guidelines for Ask.com. Ask Jeeves. URL accessed on May
26, 2005.
-
Brin, Sergey and Page, Lawrence (1998). "The
Anatomy of a Large-Scale Hypertextual Web Search Engine". Proceedings
of the seventh international conference on World Wide Web 7,
107-117.
-
Our
Search: Google Technology. Google. URL accessed on June
11, 2005.
-
Google
Patent Application - Information Retrieval Based on Historical
Data. History. URL accessed on October
10, 2005.
-
Cho, J., Garcia-Molina, H., and Page, L. (1998).
"Efficient
crawling through URL ordering". In Proceedings of the
seventh conference on World Wide Web.
-
The
Clever Project. History. URL accessed on May
4, 2006.
-
Kent, Peter (2004). Search Engine Optimization For
Dummies. Wiley Publishing Inc. ISBN
0-7645-6758-6.
|