Introduction

Whenever a machine-learning approach works with non-numerical input data, it needs to be encoded into a numerical representation in some fashion. The field of natural language processing (NLP) provides the link between human-generated (or at least human-readable) text into a numerical representation (involving word frequencies or probabilities for word-occurrence in some sense). Should we have speech instead of text, this would involve speech-to-text (STT) conversion or vice versa. The conversion of hand-written text into machine-represented text is called optical character recognition (OCR).

Learning outcomes

This module will help you do the following:

Warm-up

Browse the below web articles before the in-class discussion:

Warm-up assessment

Please first browse the above articles. Then, without consulting any sources, describe in writing how you yourself go about establishing the tone of sentiment expressed in a paragraph of writing such as an email, an online product review, or a post on social media. What cues do you rely on to determine whether it is a joke, whether the author is pleased or aggrevated, whether it is intended as sarcastic, and so forth? How certain are you, in general, that your assessment of the sentiment the writer intended to express is accurate?

Concepts

After this module, you should be familiar with the following concepts:

Remember that you can always look concepts up in the glossary. Should anything be missing or insufficient, please report it.