3

Quantifying Adaptability in Pre-trained Language Models with 500 Tasks

Sparse Distillation: Speeding Up Text Classification by Using Bigger Models

A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

Linformer: Self-Attention with Linear Complexity