Output of the Na?ve Bayes Classifier in terms of errors, accuracy by class and confusion matrix, on Age dataset.
View of an ARFF dataset which consists of a list of instances, and the attribute values for each instance separated by commas.
Analyzing the result The result displays the summary of the data set followed by the algorithm used to analyze it. It also gives the predictive performance of the machine-learning algorithm applied on the dataset. Thereafter the confusion matrix displays the number of instances classified properly and those misclassified. The classification error is displayed mentioning the mean absolute error and the root mean squared error of the class probability estimates.
Processing huge datasets If the dataset is too huge, running to a few thousand attributes and a few lakh records, it can happen that Weka runs into an 'OutOfMemory' exception. Most Java virtual machines allocate a certain maximum amount of memory which is much less than the amount of RAM to run Java programs. However, we can extend the memory available for the virtual machine by setting appropriate options. Alternately, Weka offers several filters for re-sampling a dataset and generating a new dataset reduced in size. Besides, there are schemes that can be trained in an incremental fashion, not just in batch mode unlike most classifiers which require all the data before they can be trained. Such a classifier will load the dataset incrementally and feed the data instance by instance to the classifier.
Conclusion It is difficult for a single machine learning tool to suite all data mining requirements even as the universal learner is still a distant dream. In order to obtain an accurate model of real datasets, the learning algorithm must match the domain. Data mining is an experimental science and provides a workbench for data preprocessing tools and machine learning algorithms. Weka helps in realizing the goal of data mining, by predicting missing values and validating that the predicted values are correct.
Abhinav Gupta & Sumit Goswami
Get most out of your technology infrastructure investments with Dell
About CIOL | Media Kit | Site Map | Contact Us | Help | Write to us | Jobs@CyberMedia | Privacy Policy
Copyright © CyberMedia India Online Ltd. All rights reserved. Usage of content from web site is subject to Terms and Conditions.