dialogsum: a dataset of dialogs. One of many datasets in the datasets library. The goal is to teach summarization task. The test subset contains a summary of the dialog written by a human.
AutoModelForSeq2SeqLM: FLAN-T5 is loaded with this instruction.
tokenizer: It seems that each model defines its own tokenizer method. We can load the tokenizer for a particular model by using AutoTokenizer.from_pre_trained.
Notes
We use the correspondent tokens of the dialog as input of the model.generate command. This method will itself output tokens that we are going to use as the input of the tokenizer.decode method.
In my experiments, the FLAN-T5 was not good on the summarization task. The output of zero-shot inference was exactly the same of a few-shot inference for some of the dialogs. Using the pre-build prompt templates helped, though.