News
July 2025: Introducing ✨ Delta Learning ✨ ! We post-train SOTA 8B language models with only weak data, making open post-training accessible to all. Key idea: learn from the *differences* in weak data pairs.
May 2025: Introducing Spurious Rewards! Even random rewards improve Qwen models with RLVR 🤯.
|
|
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains
Scott Geng,
Hamish Ivison,
Chun-Liang Li,
Maarten Sap,
Jerry Li,
Ranjay Krishna,
Pang Wei Koh
COLM, 2025
arXiv /
code
Our hypothesis: the relative quality delta between two weak data points suffices to improve a stronger student modeling. Scaling up, delta learning enables dead simple state-of-art language model post-training with only weak (cheap) data.
|
|
Spurious Rewards: Rethinking Training Signals in RLVR
Rulin Shao*,
Shuyue Stella Li*,
Rui Xin*,
Scott Geng*,
Yiping Wang,
Sewoong Oh,
Simon Shaolei Du,
Nathan Lambert,
Sewon Min,
Ranjay Krishna,
Yulia Tsvetkov,
Hannaneh Hajishirzi,
Pang Wei Koh,
Luke Zettlemoyer
arXiv, 2025
arXiv /
blog /
code
RLVR with very silly rewards (assign rewards randomly, reward incorrect labels) can massively boost math perf. in Qwen models—but not other models! Suggests that RLVR (at current scales) mostly elicits existing knowledge from model.
|
|
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Scott Geng,
Cheng-Yu Hsieh,
Vivek Ramanujan,
Matthew Wallingford,
Chun-Liang Li,
Pang Wei Koh*,
Ranjay Krishna*
NeurIPS, 2024
arXiv /
code
Does synthetic data from generative AI truly allow us to bootstrap and surpass the original real data used to train the generator? We propose a principled baseline to ground this question empirically, and find no — not yet.
|
|
Affective Faces for Goal-Driven Dyadic Communication
Scott Geng*,
Revant Teotia*,
Purva Tendulkar,
Sachit Menon,
Carl Vondrick
arXiv, 2023
arXiv /
project page /
dataset
We introduce a language-augmented vision framework for modeling social interactions in videos of two-person conversations. To study this problem, we create the RealTalk video dataset with 100+ hours of in-the-wild conversations.
|
|
Understanding Zero-shot Adversarial
Robustness for Large-Scale Models
Chengzhi Mao*,
Scott Geng*,
Junfeng Yang,
Xin Wang,
Carl Vondrick
ICLR, 2023
arXiv /
code
We identify the novel problem of zero-shot adversarial robustness and propose a new text-grounded adversarial training objective that can help make CLIP robust while preserving its ability to generalize.
|
|
NeuDep: Neural Binary Memory Dependence Analysis
Kexin Pei,
Dongdong She*,
Michael Wang*,
Scott Geng*,
Zhou Xuan,
Yaniv David,
Junfeng Yang,
Suman Jana,
Baishakhi Ray
ESEC/FSE, 2022
arXiv
/
code
The semantic meaning of code is explictly measureable as the CPU's runtime memory values. Predicting execution traces is thus a natural self-supervised task, which we leverage to learn good code representations.
|
|
Cerebellar Oscillations in Familial and Sporadic Essential Tremor
Shi-Bing Wong,
Yi-Mei Wang,
Chih-Chun Lin,
Scott Geng,
Nora Vanegas-Arroyave,
Seth Pullman,
Sheng-Han Kuo,
Ming-Kai Pan
The Cerebellum, 2021
paper
Low-frequency brain waves are correlated with symptom severity in sporadic essential tremor but not familial (i.e. genetic). Suggests difference in mechanism.
|
Jon Barron has a very clean website.
Last updated: July 15th, 2025.
|
|