1. Skip to content
  2. Skip to main menu
  3. Skip to more DW sites

The Life of a Search Query

DW staff (ktz)August 3, 2004

The Internet is full of possibilities. You’ll find everything from encyclopedic entries to telephone book listings, and it’s all available at the click of a search button. But how does an Internet search actually work?

https://p.dw.com/p/1pcf
Now how does that work?Image: AP

When you begin a search for a specific word, you click on a so-called search engine, a Web site designed specifically for culling the World Wide Web for all websites containing the entered word or words. There are two types of search engines, crawler-based and human-powered directories.

Crawlers

Crawler-based search engines, such as HotBot or Google, create their listings automatically. They "crawl" or "spider" the Web on a regular basis in search of information. They record page titles, text copy, links and any other elements in giant indexes, similar to those of a book, and then make the content available to their users.

Crawlers have three major elements. The first is the spider which visits all Web pages, reads them and then follows links to any other pages listed on the site.

The spider returns to the page on a regular basis and looks for changes. So if you update your Web pages, crawler-based search engines eventually find the changes and these work their way into the search engine.

The second element is the index. Everything the spider reads goes into a giant index, sometimes called catalog. The index records every page the spider finds, and updates it when new information is added. Until a Web page is indexed, it cannot be made available to a user.

The third part of the crawler-based engine is the software. It sifts through millions of indexed pages to find matches for a user’s search query. It then ranks them in order of what it believes is the most relevant. Search engines vary in the way they rank pages, but most simply count keywords.

Human-powered directories

A human-powered directory, such as Yahoo, depends on a person for entering listings. Either Web site owners submit short descriptions to the directory for the entire site and pay for a listing or external editors from the directory company write entries for sites they review.

Each entry is categorized and sub-categorized for easier access by users.

When you enter a query in this type of directory, the search software only looks for matches within the submitted descriptions.

Changing and updating Web sites has no effect on a directory listing. Things that are useful for improving a listing with an automatic search engine have no effect on a human-powered directory. The only exception is that a good site is likely to be reviewed for free and entered higher in the directory's listing.

New search technology

Google is a pioneer in the development of search technology. Whereas many search engines return results based on how often keywords appear in a website, Google relies on a series of simultaneous calculations and hypertext matching analysis.

At the heart of Google’s search engine is PageRank technology, a sophisticated measurement program that determines the importance of a Web site by calculating an equation of 500 million variables and more than 2 billion terms.

Google does not count links or matches on a Web site. Instead PageRank uses the vast link structure of the Internet as an organization tool. When Google finds a link from page A to page B, it interprets it as a "vote" for page B. Google assesses a page’s importance by the number of votes it receives.

At the same time Google analyzes the pages casting the votes. Those votes cast by pages that are themselves "important" (i.e. have a lot of votes themselves) weigh more and contribute more to making other pages important. Thus, high-quality pages receive a higher page ranking in the query listings.

Unlike human-powered directories, Google relies solely on the structure of the Internet to determine a page’s importance.