Advertisment

WebIC: Software that predicts information needs

author-image
CIOL Bureau
Updated On
New Update

Computer scientists from the University of Alberta recently developed a

software called WebIC that employs machine learning tool to foresee the

information needs of web surfers.

Advertisment

The software can be integrated with search engines or downloaded directly to

computers. Tingshao Zhu, a research scholar at the University of Alberta
and one of the creators of WebIC, tells Kishore Kumar from CyberMedia

News about the new software.






What exactly is WebIC? How does it simplify web browsing?

WebIC is a client-side, Internet Explorer-based Web browser. It uses a unique

source of information, namely the user's browsing features, to learn the

search strategy employed by an actual user during his or her quest to locate

relevant information on the Web. To determine the relevant pages, WebIC

identifies the user's information need at first, and then generates an

appropriate query to a search engine to acquire an appropriate/relevant page.

WebIC works by anticipating users' needs; users can click on an icon that

leads to useful pages from anywhere on the Web without any explicit input from

the user, which is a step beyond the usual search engine index retrievals. WebIC

is able to identify the information needs of new users effectively, leading them

to previously unseen, but relevant pages. WebIC increases the ease,

accessibility and efficiency of Web browsing, allowing browsers to save time

while accessing better and more relevant data.

Advertisment

What were the challenges faced while developing the software? What is the

role of machine learning technology in this particular software?

Given that WebIC is expected to help people for their general-purpose web

search tasks, it must be independent of users, specific words and specific Web

pages and it can be used to identify relevant pages in any new Web environment.

The most challenge is to identify the user's information need based on

passive observation -- using only the information that can be gleaned by

watching the user in the course of an ordinary browsing session.

Advertisment

Whenever we need to predict a page, we first identify the user's current

information need, which is composed of hundreds of significant words - far too

many to submit to any search engine. Furthermore, on most search engines the

order of the keywords is very important as the associations are made

sequentially. WebIC uses machine learning to transfer human information need

into the type of inquiries a computer can fully understand.

Machine learning is everywhere in WebIC. It uses a unique source of

information, namely the user's browsing features, to make predictions. Here,

WebIC interprets the browsing features as signals communicating the user's

attitude towards the content of the current pages within the sequence, and

therefore help to identify the user's information need.

We use a learning algorithm to build a classifier, capable of predicting

information need. After training, we may find some patterns, example, “any

word that appears in the title of three consecutive pages will be in the

destination page”. While training on these browsing feature values, no

specific words were involved, thus we can apply the method to make predictions

about pages that have never been visited.






How many persons were involved in developing this software? When did you
start working on this project?

Advertisment

I am the only graduate student working on this project from 2000 summer until

Dec. 2004 under the supervision of Dr. Russ Greiner and Dr. Gerald Haeubl. I

formulated the ideas, wrote thousands of lines of code, designed and ran several

careful user studies, and wrote up the results. There was even an eight-people

team working on our LILAC (Learn from the Internet: Log, Annotation, Content)

study.

This overall project was extremely successful, with an amazing number (10) of

publications in prestigious conferences and workshops, including one that won

the Best student paper (in UM03).









Tell me something about the kind of testing done for this application before
coming up with the final version. Who all were involved in the testing part?







In order to develop the product in its current state, two user studies were
conducted to collect annotated Web logs and evaluate WebIC by actual people.



To collect data for training, we conducted a user study (Travel Study) by

enlisting 114 students. Each participant was instructed to annotate while

browsing. 129 participants requested 15,105 pages, and 82.39 per cent of the

URLs were visited only one or two times. Clearly very few URLs had strong

support in this dataset, which would make it very difficult to make

recommendations based on page correlation and frequency. Using Browsing

behavior, we found an average accuracy of around 87.4 per cent for predicting

the user's information need.

Advertisment

A total of 104 subjects participated in the five-week LILAC study. The

process begins when a participant installs WebIC on his or her own computer, and

starts browsing his own choice of web pages. During this study, whenever a

suggested page was generated, the subject was instructed to evaluate the

recommendation to indicate whether the information provided on the page

suggested by WebIC was relevant for his or her search task. The results indicate

that WebIC works effectively, finding relevant pages approximately 70 per cent

of the time. In a follow-up survey, around 75 per cent subjects were willing to

continue to use WebIC if it is still available after the study.






Where do you go from here after devising this software? Who will market it
and who all are your target customers?







Given the multiple benefits WebIC can provide to Web browsers and search engine
developers, there are a number of commercialization options to bring this

technology to market. The three main routes to market for this new technology:

WebIC can be incorporated into a search Website similar to “Ask.com”, a

browser extension, as well as an enterprise search engine. WebIC has potential

appeal to anyone who uses the Internet.


Given the exceptional opportunities that WebIC represents, a spin-off company

will be created to commercialize this technology. TEC Edmonton, the University

of Alberta technology transfer office, will be a partner and stakeholder. It is

expected that TEC Edmonton will provide WebIC with some key technology

commercialization services (IP protection, VenturePrize, exposure to private

capital) as well as provide access to infrastructure options including space and

preferred supplier pricing. A professional consultant team has also been hired

to conduct an extensive market research, which is funded by IRIS. The inclusion

of TEC Edmonton and a professional market research team demonstrates the company's

commitment to exploring the opportunity to commercialize WebIC.






What is the scope of further research and development?

The ability to learn and adapt to user behavior is a key element of WebIC.

This core functionality should be subject to continual research and enhancement

as the field of machine learning progresses. There is still a great deal of

research and work that should be done to make WebIC more commercially useful.

The areas that need to be upgraded are: incorporating other browsing features,

such as additional page content information; training a personalized model, and

then using it in combination with the generic ones; exploring Natural Language

processing systems to extend the range of the predictions and using the emerging

Web technology such as semantic Web to develop a better understanding of the

context of arbitrary pages.

© CyberMedia News

tech-news