Scenario B

You oversee the online discussion board of a nonprofit organization for which you volunteer on weekends. It is important to allow your target audience to interact, but you worry about hate speech and spam. Assigning volunteer staff to screen all posts and comments seems unfeasible, not only because of the workload but also because of the constant exposure to potential toxicity.

Illustration by Robert Couse-Baker, Wikimedia, reused with Creative Commons licence.

You have been talking to your colleagues at the organization about your plans at work for detecting pavement defects, and they encourage you to explore options that would employ computational intelligence to do the heavy lifting for you in this task as well.

Phase 1

The forum log you requested are attached for each posting
on the discussion board, indicating the author and the number
of upvotes. I got the volunteers to label some of the posts
either as spam or good and set the third column as none for
the posts nobody looked at. Let me know if you need anything
else. Cheers, Robin.

Robin has been supporting your data-analysis efforts and has helped clean up a dataset (download the log file) from the forum posts. Your current hypothesis is that people who post abnormally often and get few up-votes might be undesirable users.

Authors are represented by their numerical user IDs since you do not want anyone to think ill of an individual just because your system might, at an early stage, imply that they are a spammer or a troll.

Hands-on option

Basic stage for the hands-on option

With your preferred computational tool (the Python example from class is just fine), train a perceptron to label the posts that were not manually labelled (those that say none in the third column) into either spam or good. Remember that your training set can only contain data points that are manually labelled, and you might want to set a part of those aside for testing purposes. Discuss your code in writing and report the results of the model. Include a confusion matrix.

In-depth stage for the hands-on option

Select and compute at least three performance measures (either based on the confusion matrix or on some other aspect of the resulting model and/or of the training/testing calculations).

Conceptual option

Basic stage for the conceptual option

Read the first chapter of the online textbook Neural Networks and Deep Learning (Nielsen, 2019), and then sketch a rudimentary illustrated glossary on the main components and elements of a simple neural network.

In-depth stage for the conceptual option

Browse through the textbook An Introduction of Neural Networks (Gurney, 1997) and write down any interesting advances, components, and elements that more complex neural networks can contain and apply.