AI & MACHINE LEARNING

Zero-shot and few-shot learning

Regimes in which a model solves a task with no labeled examples of the target class (zero-shot) or very few (few-shot). In language models, they take the form of in-context learning, with the task specified in the prompt itself.

Extended definition

Zero-shot and few-shot learning are regimes in which a model must solve a task with no labeled examples of the target class, in the zero-shot case, or with very few, in the few-shot case. The motivation is scarcity: labeling data is expensive, and many classes have few or no examples. In classical machine learning, zero-shot relies on auxiliary semantic information, such as attributes that describe a never-seen class, allowing it to be recognized from its description. Xian and colleagues (2019) systematize this approach in vision and show, with a unified benchmark, how inconsistent evaluation protocols inflated results. Few-shot, in turn, uses prior knowledge to generalize from few examples; Wang and colleagues (2020), in the reference review, organize the methods into three fronts: data augmentation, restriction of the hypothesis space, and adaptation of the search algorithm. In language models, these regimes took on a new form: in-context learning, where the task is specified in the prompt itself, with no examples (zero-shot) or a few examples (few-shot), without updating the model’s weights.

When it applies

Zero-shot and few-shot apply when labeling data for the target task is infeasible or too expensive. They apply to long-tail problems, with many rare classes, and to fast-changing domains, where collecting and labeling for each new category cannot keep pace. In language models, they apply to everyday use: asking for a task with no example is zero-shot, and including a few examples in the prompt usually improves the result, the few-shot case, with no training cost. Wang and colleagues (2019) catalog zero-shot applications in vision, language, and information retrieval. They apply as an alternative to fine-tuning when there is neither data nor budget to adjust the model, and as a quick baseline before investing in labeling.

When it does not apply

These regimes do not apply when abundant labeled data exists and the task is stable: there, supervised training or fine-tuning delivers superior and more reliable performance. Zero-shot does not apply without quality auxiliary semantic information; poor or misleading class descriptions collapse performance. Few-shot by prompt examples does not apply stably: the choice, order, and format of the examples affect the result greatly, and Xian and colleagues (2019) warn that careless evaluation overestimates the real capability. They do not apply when the cost of error is high and confidence must be guaranteed, since generalization from little or no example is fragile. And they do not apply as a substitute for data when the signal simply does not exist in the available information.

Applications by field

  • Language processing: use of language models via prompt, with no or few examples, without weight adjustment.
  • Computer vision: recognition of never-seen classes from the semantic attributes that describe them.
  • Information retrieval: classification and search in new categories without specific labeled data.
  • Long-tail domains: areas with many rare classes, where labeling each category is infeasible.

Common pitfalls

The first pitfall is treating zero-shot and few-shot as equivalent to a well-trained supervised model: generalization from little is more fragile and variable. The second is, in the prompt, ignoring sensitivity to the selection and order of examples, which change the result significantly. The third is evaluating with a careless protocol, contaminating the test with seen classes and overestimating the capability. The fourth is relying on poor auxiliary semantic information in zero-shot, expecting recognition from vague descriptions. The fifth is choosing these regimes by inertia when labeled data is available, giving up the superior performance that supervised training or fine-tuning would offer.

Last updated —