BANGALORE, INDIA: There are multiple gigs of data to deal with in our lives and this only goes on increasing with each passing day. The gap between the data generated and analyzed is also growing. So, you ought to look for techniques that make things easier. Machine learning is one such technique that searches a very large dataset of possible hypothesis to determine the best fit in the observed data and any prior knowledge held by the learning system. Data Mining augments the search and understanding of the electronically stored data.
What is WEKA?
Waikato Environment for Knowledge Analysis (WEKA), developed at the University of Waikato, New Zealand, is a collection of machine learning algorithms with data preprocessing tools to provide input to these algorithms. The tool was developed in Java and runs on Linux as well as Windows. It can also be used to develop and analyze new machine learning algorithms.
It is open source software and distributed under the terms of GNU General Public License. The input to the Machine Learning algorithms is in the form of a relational table in the ARFF format. Weka comes with an API documentation generated using Javadoc. More details on Weka and its usage are available across a few chapters in the book written by Ian H Witten and Eibe Frank, 'Data Mining: Practical Machine Learning Tools and Techniques,' 2nd Edition, Morgan Kaufmann Series, San Francisco, 2005.
How it helps
Some key features of WEKA include:
Preprocess: Weka has file format converters for spreadsheets, C4.5 file formats and serialized instances. It can also open a URL and use HTTP to download an ARFF file from the Web or open a database using JDBC, and retrieve instances using SQL. It also provides a list of filters to delete specified attributes from a dataset.
Classify: Weka trains and tests learning schemes that perform classification or regression. The classifiers can be divided into Bayesian, trees, rules, functions and lazy. It also builds a linear regression model and allows the user to build their own classifiers interactively. It also provides options for a number of meta learners.
Cluster: Weka shows the clusters and the number of instances in the cluster. Thereafter it determines the majority class in each cluster and gives the confusion matrix.
Associate: Weka contains three algorithms for determining
association rules-apriory, predictive apriory and filtered associators. It has no methods for evaluating such rules.
Attribute Selection: Weka gives access to several methods for attribute selection, which involves an attribute evaluator and a search method. Attribute selection can be performed using the full training set or cross-validation.