One thing that distinguishes the online
world from the real one is that it is very easy to find things. To find a copy
of The Economist in print, one has to go to a news-stand, which may or may not
carry it. Finding it online, though, is a different proposition. Just go to
Google, type in "economist" and you will be instantly directed to economist.com.
Though it is difficult to remember now, this was not always the case. Indeed,
until Google, now the world’s most popular search engine, came on to the scene
in September 1998, it was not the case at all. As in the physical world,
searching online was a hit-or-miss affair. Google was vastly
better than anything that had come before: so much better, in fact, that it
changed the way many people use the web. Almost overnight, it made the web far
more useful, particularly for nonspecialist users, many of whom now regard
Google as the internet’s front door. The recent fuss over Google’s stock market
flotation obscures its far wider social significance: few technologies, after
all, are so influential that their names become used as verbs.
Google began in 1998 as an academic research project by Sergey Brin and
Lawrence Page, who were then graduate students at Stanford University in Palo
Alto, California. It was not the first search engine, of course. Existing search
engines were able to scan or "crawl" a large portion of the web, build an index,
and then find pages that matched particular words. But they were less good at
presenting those pages, which might number in the hundreds of thousands, in a
useful way. Mr Brin’s and Mr Page’s accomplishment was to devise
a way to sort the results by determining which pages were likely to be most
relevant. They did so using a mathematical recipe, or algorithm, called
PageRank. This algorithm is at the heart of Google’s success, distinguishing it
from all previous search engines and accounting for its apparently magical
ability to find the most useful web pages. Untangling the
web PageRank works by analysing the structure of the web itself.
Each of its billions of pages can link to other pages, and can also, in turn, be
linked to. Mr Brin and Mr Page reasoned that if a page was linked to many other
pages, it was likely to be important. Furthermore, if the pages that linked to a
page were important, then that page was even more likely to be important. There
is, of course, an inherent circularity to this formula--the importance of one
page depends on the importance of pages that link to it, the importance of which
depends in turn on the importance of pages that link to them. But using some
mathematical tricks, this circularity can be resolved, and each page can be
given a score that reflects its importance. The simplest way to
calculate the score for each page is to perform a repeating or "iterative"
calculation (see article). To start with, all pages are given the same score.
Then each link from one page to another is counted as a "vote" for the
destination page. Each page’s score is recalculated by adding up the
contribution from each incoming link, which is simply the score of the linking
page divided by the number of outgoing links on that page. (Each page’s score is
thus shared out among the pages it links to.) Once all the
scores have been recalculated, the process is repeated using the new scores,
until the scores settle down and stop changing (in mathematical jargon, the
calculation "converges"). The final scores can then be used to rank search
results: pages that match a particular-set of search terms are displayed in
order of descending score, so that the page deemed most important appears at the
top of the list. |