Using Waffles for Naive Bayes

Intro

Waffles is an OpenSource machine learning library for C++ that also has a command-line interface. I am interested in using it for some projects, but I the documentation does not have full examples on how to use it, just a few one-liners here and there. The Wikipedia article on Naive Bayes has a good example walk-thru of Naive Bayes. I intend to implement that example with Waffles.

Data

The Wikipedia article claims the following input data can predict gender:

sex height weight foot size
male 6 180 12
male 5.92 190 11
male 5.58 170 12
male 5.92 165 10
female 5 100 6
female 5.5 150 8
female 5.42 130 7
female 5.75 150 9

Then, giving it the sample data of: {6,130,8}, it should produce posterior numerator (male) = 6.1984e-09 and posterior numerator (female) = 5.3778e-04.

So, I created this CSV file.

Process

First, I converted the csv file to Waffles format, arff. Note that the CSV file must not have a header:

waffles_transform import gender.csv > gender.arff

Then, I tried to create a naivebayes model using the newly created arff file:

waffles_learn train gender.arff naivebayes

and got: GNaiveBayes does not support continuous attributes. You should discretize first to convert real values to nominals.

I’m not sure if discretizing the data will produce the desired result, but I tried it anyway:

waffles_learn train gender.arff discretize naivebayes > gender.twt

Then, I tried some cross validation:

waffles_learn crossvalidate -reps 50 -folds 2 gender.arff -labels 0 discretize naivebayes

I got: Attr: 3, Mean accuracy: 0.74, Deviation: 0.23815367248307. Not bad!

The -labels 0 tells the system to use the first column of data as the label (male,female). Running the same command several times gives different results.

Now, let’s convert the test data into an arff file.

waffles_transform import test.csv>test.arff

And let’s test it:

waffles_learn predict gender.twt test.arff

The raw numbers are definitely different from numbers given by Wikipedia:

Class waffles Wikipedia
male 10.285714285714 6.1984e-09
female 6.8571428571428 5.3778e-04

I’m not 100% sure of the meaning of each number. The magnitude is very different and the male’s number is larger for waffles and smaller for Wikipedia.

Getting Help

During this process, I often sought help by just running the usage command for the command-line. For example, waffles_transform usage shows many options. Also, the waffles_wizard command is a GUI interface to help users build a command-line string.

Files


399 Words

2011-01-17 10:47 +0000