Many new solutions are coming up out of which Apache SPARK and STORM are really picking pace
BANGALORE, INDIA: The most discussed topic of today's tech era is the buzz word "Big Data". In an attempt to further analyze why Big Data has become the focal point of almost all current discussions breaks down to various aspects of Big Data and its prominence in the world of technology.
Firstly, today it is viable for enterprises to store very large data as economically as possible due to drastic reduction in storage cost. Large scale usage of technology in enterprises is creating more opportunity for generation of big data.
Such generation and storage of big data is further creating opportunity to discover new insights which can be of great value to enterprises. How to effectively and efficiently store big data and how to mine this data for real time and long term insight is becoming the focus of technology solutions endeavor.
Technology solutions in big data space broadly can be divided into four areas - Big data storage, big data processing, big data analytics/mining and big data visualization. In big data storage, solutions are trying to first look at nature of big data and then find better ways to store and retrieve such data. Big data necessarily need not be in relational format. It can be in the form of documents, tweets/short messages, JSON style data, log files, call records etc. This is where solutions can be found such as Mongo DB, Cassandra DB, HBase, Big Table etc. Both OLTP and OLAP category databases need redesign to handle big data effectively and efficiently. In each of these categories, new solutions can be found.
In Big data processing space, Hadoop currently seems to be most popular platform. Many new solutions are coming up out of which Apache SPARK and STORM are really picking pace. A look at Google trend clearly highlights surge of interest in these two solutions. Map reduce programming model of Hadoop is restrictive for applications which need to process same data multiple time. SPARK overcomes this limitation and makes this kind of processing a lot faster. STORM comes very handy for stream processing in real time.
Big data analytics and Mining is one area where there are not many open source platforms which provide higher level functionality except may be WEKA and MAO from University of Waikato. WEKA is a data mining framework whereas MAO (Massive Online Analysis) provides big data stream mining capability in real time. Both are written in Java. WEKA provides Java APIs and MAO - (Source: http://moa.cs.waikato.ac.nz/) can be easily integrated and used with Hadoop or Storm. Big data analytics and mining is the area where big data can be analyzed to draw meaningful insights. Investment in big data storage and maintenance can prove useful only when this layer of technology solution can really provide useful capability to decision makers.
How big data analysis and discovered insights and patterns are presented to decision maker is the area which big data visualization addresses. Here it may be difficult to find a very comprehensive visualization platform though several visualization tools may be available such as Dygraphs etc. A visualization platform should be able to provide multi-layered and multi-level view to the decision maker with capability to change the variables for analysis dynamically and should be able to render the results quickly.
Together, big data storage, processing, analysis and visualization solutions should be able to deliver business value to an enterprise resulting benefit in terms of saving of cost, providing better service, optimization of infrastructure and network etc which justifies investment in these technologies and solutions.
(The author is senior vice president, Huawei Technologies India Pvt Ltd)