Fine-Tuning Language Models

TODO: introductory paragraph or two. In C1 we saw that LMs can be useful for a variety of tasks without further training through the power of ‘prompting’. In this chapter we’ll see how we can further improve the performance of LMs on specific tasks by fine-tuning them on a dataset of examples. We’ll also see how to use the LM to generate text, and how to use it to classify text. (or something to this effect)

Generating a specific kind of text

Taking a general LM and fine-tuning it for a specific format, such as conversation. SFT. Example: write like me - training on your own tweets! TODO

Classifying text

Taking a general LM and fine-tuning it for a specific classification task. Example: sentiment analysis. TODO

Learning from Human Feedback

RLHF and modern approaches to alignment.

Ethics, Access, Scale

Some note about how things change as these models get ever larger and more capable. TODO

Summary

TODO