Advertisment

EmTech Spl: A revolution called information extraction

author-image
CIOL Bureau
Updated On
New Update

NEW DELHI, INDIA: It can not only make the search results more relevant for the users but can also substantially reduce the editorial costs involved in manually filtering the search content.

Advertisment

This technology is called Information Extraction (IE) and Rajeev Rastogi, VP and head, Yahoo! Labs says that it can change the way our search looks today.

What IE does at the very basic level is to extract information from millions of websites and store the relevant information in a massive database. It then organizes this information and gives a comparative analysis.

Speaking on the potential and the benefits of IE at the EmTech conference being held in New Delhi, Rastogi said that IE is capable of filtering through criterion such as price, store distance from point of access and other such information. “IE makes search more convenient so that users don’t have to click on multiple URLs,” he said.

Advertisment

However, IE is also not without its challenges and problem. The sheer number o sites to be scanned gives the whole task a mammoth scale not easy to achieve. Then there are the vastly varied Web-layouts which not only differ from site to site but also change very frequently.

“Targeting over a billion pages can be very challenging especially when the pages are very dynamic,” said Rastogi.

The technique used for Information Extraction is called the Wrapper Induction Technique. This technique, although it works on a simple process, isn’t without limitations. It doesn’t work across all web sites due to different page layouts. And scaling of thousands of sites is also a problem.

Rastogi said the Holy Grail of IE research is Unsupervised IE - which happens with minimum external intervention. He said the future direction of IE to lies in building Machine Learned Models (ML) for every attribute. 

For more details on EmTech click here

tech-news