Introduction

The previous module focused on supervised models: training data is accompanied by a human-provided label. This module is about a non-supervised model: no labels are given to the data. Instead, we computationally search for similarities within the provided data to group it into subgroups. Once the subgroups are constructed, we can then observe what a label for it would be ("the big purple ones at the top" versus "the small green ones in the bottom corner", for example). The task of finding groups of similar data points is called clustering. Our simulated data points have several attributes: two coordinates (horizontal and vertical), three color channels (RGB), and a radius.

Learning outcomes

This module will help you do the following:

Group data into clusters
Vary model parameters to improve clustering results
Visualize multi-attribute data
Examine algorithmic convergence
Adjust a model to new data

Warm-up

Warm-up assessment

Based on the video, elaborate in a couple of sentences what "community" means in a biological sense. Then, try to extend this meaning of the word "community" to at least a couple of non-biological systems such as technological systems. Explain for each of your examples what makes you feel like "community" is a good word to describe that technological structure.

Concepts

After this module, you should be familiar with the following concepts:

Quantitative
Qualitative
Similarity
Clustering

Remember that you can always look concepts up in the glossary. Should anything be missing or insufficient, please report it.