Help Topics     Concepts     Package     Class


Naive Bayes Bean Properties and Use

Properties

Mode
Select one of the following agent modes:
Train implies that the network will learn from the data as it is processed.
Test implies that the network will classify and compute the accuracy of data as it is processed.

Use

The Naive Bayes Bean panel is used to create a classifier network. The Mode is set so that the network bean can be trained or used to provide an independant data source to test that training is sufficient.

Typically the bean is trained by providing a data source, connecting that data source to a filter, and connecting the filter as input to the classifier bean. The steps described below assume this scenario. Warning: If using data processing, ensure that DataFlow is enabled on the General tab.

To configure the bean:

  1. Create an import data source and open it. This source should have one and only one field defined as output - that's the field that represents the name of the classification. The other fields represent the pattern. The output field should be discrete or categorical.
  2. Create a filter bean if needed, either by adding it to the canvas or generating it from the import.
  3. Connect the import to the filter.
  4. The filter records may need modification: for example, shifted by 1 to accomodate different index origins. Add -1 to each field if needed.
  5. Connect the filter to the naive bayes bean. Make sure the naive bayes bean has DataFlow enabled.
  6. On the property panel Naive Bayes tab, set the mode to train.
  7. Press the Create network button. This configures the sizes of various data elements in the bean based on the inputs from the import.

To train the bean:

  1. Train the naive bayes bean by pressing the Cycle button on the Agent Editor toolbar.
  2. Set up an inspector on the input and output buffer arrays to watch those values as it trains.

To test the bean:

  1. Once the bean has processed each record in the data source, set the mode to Test.
  2. Step through one record at a time to see whether the network's classification shown in its output buffer matches that of the actual data.