Scott Geng

Scott K. Geng
sgeng@cs.washington.edu

Hi! I am a PhD student at the University of Washington, where I am very fortunate to be advised by Pang Wei Koh and Ranjay Krishna. I am broadly interested in computer vision and natural language processing. My doctoral work is supported by an NSF Graduate Research Fellowship.

Previously, I graduated with a BA in Math and Computer Science from Columbia University, where I was lucky to be introduced to research by Carl Vondrick and Junfeng Yang.

Email / CV / Google Scholar / Twitter / GitHub

News

July 2025: Introducing ✨ Delta Learning ✨ ! We post-train SOTA 8B language models with only weak data, making open post-training accessible to all. Key idea: learn from the *differences* in weak data pairs.

May 2025: Introducing Spurious Rewards! Even random rewards improve Qwen models with RLVR 🤯.

Research

The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains
Scott Geng, Hamish Ivison, Chun-Liang Li, Maarten Sap, Jerry Li, Ranjay Krishna, Pang Wei Koh
COLM, 2025
arXiv / code

Our hypothesis: the relative quality delta between two weak data points suffices to improve a stronger student modeling. Scaling up, delta learning enables dead simple state-of-art language model post-training with only weak (cheap) data.

Spurious Rewards: Rethinking Training Signals in RLVR
Rulin Shao*, Shuyue Stella Li*, Rui Xin*, Scott Geng*, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, Luke Zettlemoyer
arXiv, 2025
arXiv / blog / code

RLVR with very silly rewards (assign rewards randomly, reward incorrect labels) can massively boost math perf. in Qwen models—but not other models! Suggests that RLVR (at current scales) mostly elicits existing knowledge from model.

	The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh, Ranjay Krishna NeurIPS, 2024 arXiv / code Does synthetic data from generative AI truly allow us to bootstrap and surpass the original real data used to train the generator? We propose a principled baseline to ground this question empirically, and find no — not yet.
	Affective Faces for Goal-Driven Dyadic Communication Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick arXiv, 2023 arXiv / project page / dataset We introduce a language-augmented vision framework for modeling social interactions in videos of two-person conversations. To study this problem, we create the RealTalk video dataset with 100+ hours of in-the-wild conversations.
	Understanding Zero-shot Adversarial Robustness for Large-Scale Models Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, Carl Vondrick ICLR, 2023 arXiv / code We identify the novel problem of zero-shot adversarial robustness and propose a new text-grounded adversarial training objective that can help make CLIP robust while preserving its ability to generalize.
	NeuDep: Neural Binary Memory Dependence Analysis Kexin Pei, Dongdong She, Michael Wang, Scott Geng, Zhou Xuan, Yaniv David, Junfeng Yang, Suman Jana, Baishakhi Ray ESEC/FSE, 2022 arXiv / code The semantic meaning of code is explictly measureable* as the CPU's runtime memory values. Predicting execution traces is thus a natural self-supervised task, which we leverage to learn good code representations.
	Cerebellar Oscillations in Familial and Sporadic Essential Tremor Shi-Bing Wong, Yi-Mei Wang, Chih-Chun Lin, Scott Geng, Nora Vanegas-Arroyave, Seth Pullman, Sheng-Han Kuo, Ming-Kai Pan The Cerebellum, 2021 paper Low-frequency brain waves are correlated with symptom severity in sporadic essential tremor but not familial (i.e. genetic). Suggests difference in mechanism.

Teaching

At Columbia.

Course Assistant (Spring 2021, Fall 2021): COMS 4771 Machine Learning

Jon Barron has a very clean website.
Last updated: July 15th, 2025.