Computer scientists from the University of Alberta recently developed a
software called WebIC that employs machine learning tool to foresee the
information needs of web surfers.
The software can be integrated with search engines or downloaded directly to
computers. Tingshao Zhu, a research scholar at the University of Alberta
and one of the creators of WebIC, tells Kishore Kumar from CyberMedia
News about the new software.
What exactly is WebIC? How does it simplify web browsing?
WebIC is a client-side, Internet Explorer-based Web browser. It uses a unique
source of information, namely the user's browsing features, to learn the
search strategy employed by an actual user during his or her quest to locate
relevant information on the Web. To determine the relevant pages, WebIC
identifies the user's information need at first, and then generates an
appropriate query to a search engine to acquire an appropriate/relevant page.
WebIC works by anticipating users' needs; users can click on an icon that
leads to useful pages from anywhere on the Web without any explicit input from
the user, which is a step beyond the usual search engine index retrievals. WebIC
is able to identify the information needs of new users effectively, leading them
to previously unseen, but relevant pages. WebIC increases the ease,
accessibility and efficiency of Web browsing, allowing browsers to save time
while accessing better and more relevant data.
What were the challenges faced while developing the software? What is the
role of machine learning technology in this particular software?
Given that WebIC is expected to help people for their general-purpose web
search tasks, it must be independent of users, specific words and specific Web
pages and it can be used to identify relevant pages in any new Web environment.
The most challenge is to identify the user's information need based on
passive observation -- using only the information that can be gleaned by
watching the user in the course of an ordinary browsing session.
Whenever we need to predict a page, we first identify the user's current
information need, which is composed of hundreds of significant words - far too
many to submit to any search engine. Furthermore, on most search engines the
order of the keywords is very important as the associations are made
sequentially. WebIC uses machine learning to transfer human information need
into the type of inquiries a computer can fully understand.
Machine learning is everywhere in WebIC. It uses a unique source of
information, namely the user's browsing features, to make predictions. Here,
WebIC interprets the browsing features as signals communicating the user's
attitude towards the content of the current pages within the sequence, and
therefore help to identify the user's information need.
We use a learning algorithm to build a classifier, capable of predicting
information need. After training, we may find some patterns, example, “any
word that appears in the title of three consecutive pages will be in the
destination page”. While training on these browsing feature values, no
specific words were involved, thus we can apply the method to make predictions
about pages that have never been visited.
How many persons were involved in developing this software? When did you
start working on this project?
I am the only graduate student working on this project from 2000 summer until
Dec. 2004 under the supervision of Dr. Russ Greiner and Dr. Gerald Haeubl. I
formulated the ideas, wrote thousands of lines of code, designed and ran several
careful user studies, and wrote up the results. There was even an eight-people
team working on our LILAC (Learn from the Internet: Log, Annotation, Content)
study.
This overall project was extremely successful, with an amazing number (10) of
publications in prestigious conferences and workshops, including one that won
the Best student paper (in UM03).
Tell me something about the kind of testing done for this application before
coming up with the final version. Who all were involved in the testing part?
In order to develop the product in its current state, two user studies were
conducted to collect annotated Web logs and evaluate WebIC by actual people.
To collect data for training, we conducted a user study (Travel Study) by
enlisting 114 students. Each participant was instructed to annotate while
browsing. 129 participants requested 15,105 pages, and 82.39 per cent of the
URLs were visited only one or two times. Clearly very few URLs had strong
support in this dataset, which would make it very difficult to make
recommendations based on page correlation and frequency. Using Browsing
behavior, we found an average accuracy of around 87.4 per cent for predicting
the user's information need.
A total of 104 subjects participated in the five-week LILAC study. The
process begins when a participant installs WebIC on his or her own computer, and
starts browsing his own choice of web pages. During this study, whenever a
suggested page was generated, the subject was instructed to evaluate the
recommendation to indicate whether the information provided on the page
suggested by WebIC was relevant for his or her search task. The results indicate
that WebIC works effectively, finding relevant pages approximately 70 per cent
of the time. In a follow-up survey, around 75 per cent subjects were willing to
continue to use WebIC if it is still available after the study.
Where do you go from here after devising this software? Who will market it
and who all are your target customers?
Given the multiple benefits WebIC can provide to Web browsers and search engine
developers, there are a number of commercialization options to bring this
technology to market. The three main routes to market for this new technology:
WebIC can be incorporated into a search Website similar to “Ask.com”, a
browser extension, as well as an enterprise search engine. WebIC has potential
appeal to anyone who uses the Internet.
Given the exceptional opportunities that WebIC represents, a spin-off company
will be created to commercialize this technology. TEC Edmonton, the University
of Alberta technology transfer office, will be a partner and stakeholder. It is
expected that TEC Edmonton will provide WebIC with some key technology
commercialization services (IP protection, VenturePrize, exposure to private
capital) as well as provide access to infrastructure options including space and
preferred supplier pricing. A professional consultant team has also been hired
to conduct an extensive market research, which is funded by IRIS. The inclusion
of TEC Edmonton and a professional market research team demonstrates the company's
commitment to exploring the opportunity to commercialize WebIC.
What is the scope of further research and development?
The ability to learn and adapt to user behavior is a key element of WebIC.
This core functionality should be subject to continual research and enhancement
as the field of machine learning progresses. There is still a great deal of
research and work that should be done to make WebIC more commercially useful.
The areas that need to be upgraded are: incorporating other browsing features,
such as additional page content information; training a personalized model, and
then using it in combination with the generic ones; exploring Natural Language
processing systems to extend the range of the predictions and using the emerging
Web technology such as semantic Web to develop a better understanding of the
context of arbitrary pages.
© CyberMedia News