Andrea Orr
PALO ALTO: Dubbed the Wayback Machine (http://web.archive.org
), the archive is the work of San Francisco entrepreneur Brewster Kahle who for
the past five years has been working on a library that would store not just
documents like old newspapers that are normally preserved, but a sampling of
everything that has ever been posted on the World Wide Web.
In an interview on Thursday, Kahle said the Wayback Machine includes some of
the most amateur-looking Web pages dating all the way back to 1996, helping
students to chronicle the evolution of the Web design.
That is not all that is chronicled.
As the Web quickly became a worldwide communications tool over the past five
years, corporations and governments often hastily slapped up information with
much less care than they might have given to the contents of a bound book.
One archive from the official White House Web site now features remarks from
then-President Clinton, who almost five years prior to the Sept. 11 attacks,
warned of the need to tighten airport security.
"I will direct that all airport and airline employees with access to
secure areas be given criminal background checks and FBI fingerprint
checks," Clinton said in a Sept. 9, 1996 address now preserved on the
Wayback Machine. "I will direct the FAA to begin full passenger bag match
for domestic flights at selected airports. And I'm proud to say that several of
the commission's recommendations will be put into place immediately."
Kahle said he expects the new archive, which is free, will be used by
everyone from journalists looking to dig up old statements by corporations, to
students and others just looking for kicks.
Kahle used it himself late one night to locate an out-of-print computer
manual when one of his computers was giving him trouble. Much more salacious
material, like some fly-by-night porn sites thought to be extinct, are also
preserved.
The Wayback Machine, Kahle said, will contain copies of many defunct
magazines, which may or may not have maintained their own archives. It also
provides a way to track the ever-changing messages from different companies,
such as the Internet advertising company DoubleClick Inc., which routinely
amended its privacy policy over the past several years in response to customer
complaints.
"If you don't actively keep a record of digital materials, they're
gone," Kahle said. "This is a huge collection. It's a celebration of
the Web." While impressive in size, the Wayback Machine is more a testament
to Kahle's commitment to save as much Web content as possible, than it is to any
advanced technology.
He archived the Web pages with basic Web crawlers that repeatedly sweep the
entire Web, excluding some password-protected sites.
Because this Web sweep takes about two months, researchers will not
necessarily be able to find a page from a particular day, but they should be
able to get a sampling from a given two month time frame.
(C) Reuters Limited.