How can we get training data


A multilayer perceptron is a neural network. A neural network is a method to approximate any complex function. In this way, the boundary between data points can also be approximated in order to subdivide them into several classes. In order to implement this functionality, a neural network must be trained. The features that are to be used for classification are fed into the neural network by the input neurons in the first layer. The input neurons are connected to the neurons of the next layer by edges and these neurons are also connected to the neurons of the layer following them - this continues up to the last layer, the output layer. The edges are weighted and these weights are trained.

Labeled training data are used to train the network: A sample e-mail, the recipient of which is known, is processed by the network. In the end, the recipient determined by the network is compared with the correct recipient: If these two classes do not match, the weights within the network must be changed. The backpropagation learning algorithm can be used to determine how the edge weights have to be adapted in order to improve the classification. This procedure is carried out several times with all existing training e-mails until the network has achieved a certain accuracy in classifying all training examples. Then it can be used to classify new, as yet unknown emails.

A multilayer perceptron is a very simple form of neural network. It consists of an input layer, at least one hidden layer and an output layer. The information within the network flows exclusively from the input layer towards the output layer, there are no return connections. By using non-linear functions in the neurons, a multilayer perceptron can also classify data that cannot be linearly separated.

Ensemble of methods

Another method tested is an ensemble of the models listed above. In order to classify an e-mail, not just one classifier is used, but several. Each of the models used receives the email and decides on one of the possible classes. Ultimately, the email is assigned to the class that most of the classifiers have chosen.

Does a minimum probability improve the ensemble? In order to ensure an improved classification, the individual models should be relatively certain of their decision. Accordingly, in this experiment the decision of a classifier was only included in the ensemble if the probability of the selected class was, for example, at least 70 percent. However, this procedure led to a slight deterioration in the ensemble result.

With the ensemble of methods without a minimum probability, we were able to increase the accuracy of the results from approx. 90 percent for the individual methods to 95 percent.

Further approaches

In addition to the models listed, other approaches were tried out, in particular more complex neural networks. However, these delivered significantly poorer results.