A Complete Guide To Search Engines

by Jayaram V

How Search Engines Work

Search engines crawl the world wide web to gather information about websites and their content. This is usually done through robots or crawlers or bots, complex mechanisms that can roam the internet with incredible speed doing what ordinary browsers can do but with much greater efficiency, speed and capacity.

The information so gathered is then passed on to indexers who index the content according to a set of business rules, algorithms and other criteria and store them as indexed data in huge databases. Each search engine company develops its own set of rules and criteria to organize the data they collect based upon the business model they have chosen. Once the data is organized, then client mechanisms such as search forms can be used to access it using keywords and various other criteria.

A typical search engine usually has four components. Together they constitute what we understand as search engine mechanism.

Information gathering mechanism.
Indexing mechanism.
Ranking mechanism and
Retrieval mechanism.

Limitations of Search Engines

Search engine business is very cost intensive because of the amount of work involved in gathering and indexing information and keeping it up-to-date. To accomplish this task search engine companies have to invest heavily in the state of the art technology and technically qualified staff to maintain, manage and manipulate the information and make it useful, convenient and meaningful for the end users. The fast expanding world wide web, with its complexity and incongruity poses a multitude of problems and challenges to the search engine companies in managing information and keeping their technologies scalable and effective. Government interference, internet threats, cyber crime, linguistic and regional variations, cross-cultural issues, absence of uniform global internet policies and usability issues and people's unwillingness to pay for search are some of the serious issues which threaten the viability of search engine business and make it one of the most difficult to manage on a long term basis without recourse to search based ads and paid listings. While these alternatives save the companies from financial problems, there is no guarantee that they do not undermine the quality of the information they provide.

Despite the advances made in search engine technology, most search engines do not have necessary means to to keep pace with the vast amount of data that is being added constantly to the world wide web and the new websites that are hosted every day. This results in some inefficiencies in the manner in which the search engines work which are discussed below.

1. Search engines have built in limitations in responding to users' queries due to the limitations in their indexing mechanism or the algorithms they use. They may also respond differently to each keyword or combination of keywords or letters and symbols depending upon how they are programmed.

2. Because of the limitations in processing and indexing information and the time and costs involved in removing irrelevant and useless information to keep the indexes clean and up-to-date, a substantial portion of the content available on the world wide web is either outdated or outside the reach of the search engines and the public who use them. The so called invisible web is considered to be two to three times larger than the visible web.

3. Search engines distribute information on several servers to manage load problems and not all of them are updated or available at the same time. So the results of a search query may vary depending upon which server received your query.

4. Most search engines limit the number of pages they crawl on a website. Even in respect of pages they crawl they index only a certain portion of content and links on a page. Google for example indexes the first 101KB of a Web page, and 120KB of PDF's.

5. Since most of the websites do not keep reliable records of date stamps or the dates on which they add or modify their content, date searching capability of search engine content is unreliable.

6. The indexing is usually a long drawn process and may involve days and weeks before the information is processed and made available to the public. So the information is not always the latest.

7. Spamming, keyword manipulation, search engine optimization techniques dilute and slow down the efforts of search engines in maintaining quality.

8. Paid submission policies used by Yahoo and other companies and paid listings compromise the quality and the actual ranking of websites based on merit.

9. The rules and regulations evolved by search engines to deal with duplicate content on the web often go against the original providers of the information. Search engines do not have a reliable mechanism to distinguish original content from the duplicate because of limitations in date stamping. As a result, providers of original content often suffer due to illegal copying and reproduction.

10. best websites in each category. Hinduwebsite is one good example.

Directory Services

Directory is a database of information about websites and their pages are organized alphabetically into categories, usually done by humans, instead of machines and automated software, using a set of predefined criteria. Users can navigate through the directory through a series of menus organized in a predictable manner to find the information they want. Unlike the search engines which require state of the art technology to gather and index information, the creation and maintenance of directory requires the involvement of huge manpower to organize, evaluate and categorize information. Hence they are slow to develop and usually smaller in size compared to the indexes created by commercial search engine companies. One of the best examples of a web directory is the one maintained by dmoz.org, which being a public domain non-commercial directory is used by several search engines and websites like Hinduwebsite.com. Among the commercial directory Yahoo's directory is perhaps the best known and the largest. Besides general directories, there are also specialized directories dealing with a specific subject or category, also known as metasites.

The Directory vs. Search Engine

Directories are very useful when you are researching on a general topic, a popular category or a particular subject. For example if you are looking for information on religion, you can go to the society and culture part of a directory to begin your search. If you are looking for information on a particular religion such as Hinduism or Buddhism you can scroll down the category on religion in the directory and locate links to them easily. Besides categories of information, the directory services usually provide an internal search engine with which you can easily look for information with in the directory using a keyword or combination of keywords. Search engines are more useful when you are looking for in-depth information, or more recent information or more specialized information on a subject, or information that is beyond the scope of the categories in a directory. The standard practice is to begin your search with directories and then move on to search engines.

Meta Search Engines

Meta search engines do not use their own crawlers or databases to gather and index information. Instead they use a complex set of routines to access the databases publicly made available by various search engines to gather information and provide them to the public in an organized way. The advantage with meta search tools is that you can simultaneously access various search engine databases and subject directories without doing individual searches and see the results displayed in one place. The main disadvantage is that the results are not necessarily comprehensive. A meta search tool can only fetch results from as many search engines as time, technology and resources permit. Secondly. Due to the limitations placed by each search engine in retrieving information, you may not always get the best results or all the results. Besides, meta search tools retrieve information basically through simple search routines. So these tools are not ideal for advanced search. Despite these limitations, if you are aiming to have an overview or comparative view of how each search engine is reacting to a particular keyword or a set of keywords, meta search engines are the best place to start with.

These links have been provided for informational purposes only. We have not evaluated them. Hinduwebsite.com does not have any relationship with them. Some of the links may be outdated as they have not been recently updated.

Best Search Engine Tools
Google All the Web Ask Jeeves Alta Vista Gigablast Lycos Teoma Yahoo AOL Search MSN Search Netscape Dipsie Fybersearch Mozdex Whatuseek Wisenut ExactSeek Lost Link/ Web Links Link Centre	Scubtheweb Jayde AOL Search HotBot Search.com Metacrawler Dogpile Mamma C4 Canada.com ixquick Infogrid WebInfoSearch Query Server 800go Debriefing Highway 61 Link Master Splat Search	37.com OneSeek MetaSpider Vivisimo PlanetSearch surfwax qbSearch ProFusion Proteus Go2 Net MegaGo.com WebFile myGO Megacrawler Search Climbers IX Quick Northern Light Subjex Zen Search	Kanoodle NBCi/ Snap Go InfoSeek 7Search Acclaim Search AllCrawl Amnesi Ampleo Deja.Com Deoji DevSearch Frequent Finders iBound Info Hiway Infomak GoshDarn! Jump City Z Search

Meta Search Engines
clusty.com HighBeam Research Dogpile Surfwax Copernic Metacrawler	IxQuick Search.com Fazzle Infogrid Vivismo Infonetware	Ithaki KillerInfo Mamma Profusion Kartoo QueryServer	Turbo10 Weblens Widow philb.com Zapmeta Searchy

Subject Directories
nthing.com Bubl Link Complete Planet Infomine Suite 101	Internet Public Library Joe Ant Librarian Index Open Directory Top Ten Links	Resource Discovery Asiaco Awesome Library BBCI Directory	Galaxy.com Gimpsy GoGuides Illumirate Ranks.com

Specialized Search Tools

Academic

Education

Biblio

Bus. Intell.

Other

Competitive Intelligence

Health

Kids

Legal

Media/Music

News/Blogs

Public Records/People

Reference

Religion

Statistics

Politics

Suggestions for Further Reading

Disclaimer: The external links provided herein are third-party links. We do not have any control over them and we cannot guarantee their accuracy or their authenticity. The links are being provided as a convenience and for informational purposes only; they do not constitute an endorsement or an approval by Hinduwebsite.com of any of the products, services or opinions of the corporation or organization or individual. Hinduwebsite.com bears no responsibility for the accuracy, legality or content of the external site or for that of subsequent links. Any transactions that you enter into with a vendor, merchant or other party listed in this site or linked from this site are solely between you and that vendor, merchant or other party. Contact the external site for answers to questions regarding its content.

Translate the Page

Search Hinduwebsite

Privacy

© 2000-2019 Hinduwebsite.com. All Rights are reserved. No part of this website can be copied or reproduced in any manner. Hinduwebsite.com presents original articles on various subjects. They are for your personal and spiritual growth not for copying and posting on your website. We do not accept donations. We rely solely upon our content to serve you. If you want to promote our website please write an introduction and post a link to it on your blog or website. However, please do not copy information from the website and then tell us that you were trying to give us publicity. We like publicity, but not in this manner. Please protect Dharma by following its values, which include non-stealing. Your use of the website is subject to these Terms of Use.