Blocking adult content with keyword detection on Mastodon

5mh dhen Dàmhair 2023 18:07:23
I have written a bot that will autodetect posts from keyword and hashtags lists and can take automated moderation actions. When using this bot, PLEASE BE VERY CAREFUL NOT TO OVERBLOCK, especially if you use autosuspend, because that will cut people off from their followers. Retest every configuration change thoroughly, also whenever you add new keywords and hashtags for detection!

Motivtion, keywords and hashtags

Example DM notifying about an autosuspension
This bot originally started as an attempt to detect the usage of slurs, with a few spam hashtags thrown in. It turned out to not be effective and the only hits I got for slurs so far were people self-describing. So, looks like my server blocks are taking care of common slurs well enough for now.

Another thing to watch out for is that a slur might not be a slur in another language, e.g. "retard" is the French word for "delay". So, I added the capability to restrict a keyword to a given language list. Note that people often do not set their language correctly though, and that not all Fediverse software allows choosing a posting language.

I turned to a different issue next: My server is open to all ages, but there are servers that allow content aimed at an adult audience, and their posts can reach my federated timeline. So, I decided to make the bot more powerful and support rulesets for groups of keywords that can trigger automated moderation actions that will also notify me via DM so that I can review the actions taken, and create an audit trail via reports. The DM also comes with convenient links to access the profile, the moderation page and the first few posts found.

For adult content, there's also the option to trigger the keywords only if the post is not marked as sensitive, in case your sever allows it and you want to check that CWs have been applied.

For designing your hashtag and keyword lists, the "keywords" list should contain only very specific terms, while the "hashtag" list can be just a little bit broader.

Rulesets and automated moderation actions

Report with posts attached in the moderation UI
Each ruleset has three thresholds that need to be crossed to trigger the action:
  1. Number of posts found that contain any of the keywords or hashtags that the bot is searching for
  2. Overall number of keywords or hashtags found
  3. Number of distinct keywords or hashtags found

The actions that can be taken are:
  1. Nothing
  2. Send yourself a DM without any action on the account - I recommend this category especially when detecting slurs in order not to overblock
  3. Mark as sensitive (useful for reviewing adult content so I don’t have to look at the pictures during the review, and false positives are quickly undone with 2 mouse clicks)
  4. Silence
  5. Suspend (this is a destructive action, so set the thresholds very high!)

All the actions taken are local only, with no report forwarding. Also, only local users will be notified via e-mail of reports to prevent people from feeling harassed by automated reports. When I block adult content, that content is usually perfectly fine according to their own server’s rules, so I don’t want to bother them! Also, while working on this, a hashtag that should not have been listed slipped into my configuration and I could quickly undo the damage before it affected anybody.

While reviewing posts, I found that people are generally very well behaved about marking their images as sensitive. So, a big thank you for that!

The script is available at Codeberg.