Fine-Tuning Language Models
TODO: introductory paragraph or two. In C1 we saw that LMs can be useful for a variety of tasks without further training through the power of ‘prompting’. In this chapter we’ll see how we can further improve the performance of LMs on specific tasks by fine-tuning them on a dataset of examples. We’ll also see how to use the LM to generate text, and how to use it to classify text. (or something to this effect)
Generating a specific kind of text
Taking a general LM and fine-tuning it for a specific format, such as conversation. SFT. Example: write like me - training on your own tweets! TODO
Classifying text
Taking a general LM and fine-tuning it for a specific classification task. Example: sentiment analysis. TODO
Learning from Human Feedback
RLHF and modern approaches to alignment.
Ethics, Access, Scale
Some note about how things change as these models get ever larger and more capable. TODO
Summary
TODO