Plesk

How does SpamAssassin training work in Plesk?

Question

How does SpamAssassin training work in Plesk?

Answer

SpamAssassin training procedure works using Bayes system, which has its own internal database. The database is updated each time a message is marked as spam or moved into spam folder or back to Inbox folder.

Bayes database contains words and patterns from processed messages known as tokens, which Bayes uses to identify message or spam or non-spam (ham). For example, if you mark a message with content or mail subject like: "Buy! This is magical offer" as spam, the sequence of the words from that message will be included into Bayes database. However, it does not mean that each message with words "buy" or "offer" will be marked as spam subsequently.

Standalone SpamAssassin and SpamAssassin inside Plesk Email Security extension work slightly different:

Standalone SpamAssassin

Bayes system is enabled only after it has a particular number of spam and non-spam (ham) messages in it.

This default value is 200 for both spam and non-spam messages. This value can be changed:

  1. Connect to the server via SSH

  2. Add new values into /etc/mail/spamassassin/local.cf by using the following SpamAssassin options (the lowest possible is 10):

    bayes_min_ham_num 100
    bayes_min_spam_num 100

  3. Restart SpamAssassin daemon:

    # systemctl restart spamassassin.service

Note that in standalone SpamAssassin both spam and non-spam (ham) messages are trained during Daily Task execution, so there is no need to train it for non-spam messages.

Progress of training can be checked by the following command:

# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 10 0 non-token data: nspam
0.000 0 10 0 non-token data: nham
0.000 0 59312 0 non-token data: ntokens
0.000 0 1574146076 0 non-token data: oldest atime
0.000 0 1596425418 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count

More values in nspam and nham lines grow more tokens are in Bayes filter database.

SpamAssassin as part of Plesk Email Security

Training available only in paid version of extension.

Bayes system is enabled only after it has a particular number of spam and non-spam (ham) messages present in Bayes database. The default value is 10 for both spam and non-spam messages. To increase the number of non-spam (ham) it is required to move message to spam folder or mark message as spam and then move it back to Inbox folder or mark message as not spam.

Progress of training can be checked by the following command:

# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 10 0 non-token data: nspam
0.000 0 10 0 non-token data: nham
0.000 0 59312 0 non-token data: ntokens
0.000 0 1574146076 0 non-token data: oldest atime
0.000 0 1596425418 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count

More values in nspam and nham lines grow more tokens are in Bayes filter database.