Introduction

Last module we worked with images. This module, we pivot to text as our input data, accessing an open-access repository for full-text electronic books. We will learn to extract structure from text documents (split them into pieces such as chapters, paragraphs, sentences, words), clean out the text (remove punctuation, alter character case, skip non-informative stop words, reduce plural forms into their singular counterparts (a simple case of stemming).

Learning outcomes

This module will help you do the following:

Warm-up

Warm-up assessment

Based on the warm-up video, make a list of about a dozen applications you can envision for NLP but that you have not yet encountered in existence. Then, for each, assess whether you think it will easy, moderate, or hard to implement. Also, assess whether it will be a wholesome positive thing to have, a neutral development of technology, or potentially harmful if used for dishonest or discriminatory purposes.

Concepts

After this module, you should be familiar with the following concepts:

Remember that you can always look concepts up in the glossary. Should anything be missing or insufficient, please report it.