I'm a Research Scientist at Apple, working on foundations of machine learning.
My work aims broadly to understand generalization in deep learning. Recent topics: calibration [1] [2] [3] [4] [5], Transformer generalization [6], and diffusion [7] [8].
I completed my PhD at Harvard, having the unique pleasure of being advised by Madhu Sudan and Boaz Barak. In my postdoc I worked with Misha Belkin. I am grateful for past support from NSF and the Simons Institute. Go Bears!
Interns at Apple (hosted or collaborated):
- Hattie Zhou. PhD student, Université de Montréal and Mila.
- Annabelle Carrell. PhD student, University of Cambridge.
- Lunjia Hu. PhD student, Stanford. (hosted by Parikshit Gopalan)
- Shivam Garg. PhD student, Stanford. (hosted by Kunal Talwar)
- Rylee Thompson. MASc student, University of Guelph. (hosted by Shuangfei Zhai)
- Elan Rosenfeld. PhD student, CMU. (hosted by Fartash Faghri)
Selected Research
See [publications] for full list.Arwen Bradley, Preetum Nakkiran
[arXiv] [tweet]
Preetum Nakkiran, Arwen Bradley, Hattie Zhou, Madhu Advani
[arXiv] [tweet]
Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
ICLR 2024.
[arXiv] [tweet]
Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran
STOC 2023.
[arXiv] [tweet] [slides: aspen] [poster: ICLR] [code]
Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
ICLR 2021.
[arXiv] [tweet]
Preetum Nakkiran*, Yamini Bansal*
[arXiv] [talk] [slides]
Preetum Nakkiran, Gal Kaplun*, Yamini Bansal*, Tristan Yang, Boaz Barak, Ilya Sutskever
ICLR 2020.
[arXiv]
Jarosław Błasiok, Venkatesan Guruswami, Preetum Nakkiran, Atri Rudra, Madhu Sudan
STOC 2018, JACM 2022.
[arXiv]
Pasin Manurangsi, Preetum Nakkiran, Luca Trevisan
APPROX-RANDOM 2016.
[arXiv]
Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada
INTERSPEECH 2015.
[pdf]
Theses
Preetum Nakkiran
Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
[pdf] [cite]
All papers
Machine Learning Theory
- What Algorithms can Transformers Learn? A Study in Length Generalization [tweet]
Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
ICLR 2024. - When Does Optimizing a Proper Loss Yield Calibration? [tweet]
Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran
NeurIPS 2023. - Loss Minimization Yields Multicalibration for Large Neural Networks
Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, Preetum Nakkiran
ITCS 2024. - A Unifying Theory of Distance from Calibration [tweet]
Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran
STOC 2023. - The Calibration Generalization Gap
A. Michael Carrell, Neil Mallinar, James Lucas, Preetum Nakkiran
ICML 2022 DFUQ Workshop. - Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting
Neil Mallinar*, Jamie Simon*, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran
NeurIPS 2022. - Deconstructing Distributions: A Pointwise Framework of Learning [demo]
Gal Kaplun*, Nikhil Ghosh*, Saurabh Garg, Boaz Barak, Preetum Nakkiran
ICLR 2023. - What You See is What You Get: Distributional Generalization for Algorithm Design in Deep Learning
Bogdan Kulynych*, Yao-Yuan Yang*, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran
NeurIPS 2022. - Knowledge Distillation: Bad Models Can Be Good Role Models
Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz
NeurIPS 2022. - Limitations of the NTK for Understanding Generalization in Deep Learning
Nikhil Vyas, Yamini Bansal, Preetum Nakkiran
Preprint. 2022. - Limitations of Neural Collapse for Understanding Generalization in Deep Learning [tweet]
Like Hui, Mikhail Belkin, Preetum Nakkiran
Preprint. 2022. - Turing-Universal Learners with Optimal Scaling Laws
Preetum Nakkiran
Manuscript. 2021. - Revisiting Model Stitching to Compare Neural Representations [tweet]
Yamini Bansal, Preetum Nakkiran, Boaz Barak
NeurIPS 2021. - The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers [slides] [tweet]
Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
ICLR 2021. - Distributional Generalization: A New Kind of Generalization [10m talk] [slides]
Preetum Nakkiran*, Yamini Bansal*
- Desk-Rejected from NeurIPS 2020.
- Rejected from ICLR 2021. [reviews]
- Rejected from ICML 2021. [reviews] [rebuttal] [meta]
- Rejected from NeurIPS 2021. [reviews]
- Rejected from ICLR 2022. [reviews] - Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems [tweet]
Preetum Nakkiran
OPT2020 Workshop (Best Student Paper) - Optimal Regularization Can Mitigate Double Descent
Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma
ICLR 2021. - Deep Double Descent: Where Bigger Models and More Data Hurt
Preetum Nakkiran, Gal Kaplun*, Yamini Bansal*, Tristan Yang, Boaz Barak, Ilya Sutskever
ICLR 2020. - More Data Can Hurt for Linear Regression: Sample-wise Double Descent
Preetum Nakkiran
Manuscript. 2019. - SGD on Neural Networks Learns Functions of Increasing Complexity
Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak
NeurIPS 2019 (Spotlight). - Adversarial Examples are Just Bugs, Too
Preetum Nakkiran
Distill 2019. - Adversarial Robustness May Be at Odds With Simplicity
Preetum Nakkiran
(Merged appears in COLT 2019). - The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science
Preetum Nakkiran, Jarosław Błasiok
Manuscript. 2018.
Theory
- Algorithmic Polarization for Hidden Markov Models
Venkatesan Guruswami, Preetum Nakkiran, Madhu Sudan
ITCS 2019. - General Strong Polarization
Jarosław Błasiok, Venkatesan Guruswami, Preetum Nakkiran, Atri Rudra, Madhu Sudan
STOC 2018. - Tracking the L2 Norm with Constant Update Time
Chi-Ning Chou, Zhixian Lei, Preetum Nakkiran
APPROX-RANDOM 2018. - Near-Optimal UGC-hardness of Approximating Max k-CSP_R
Pasin Manurangsi, Preetum Nakkiran, Luca Trevisan
APPROX-RANDOM 2016.
Machine Learning
- Compressing Deep Neural Networks Using a Rank-Constrained Topology
Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada
INTERSPEECH 2015. - Automatic Gain Control and Multi-style Training for Robust Small-Footprint Keyword Spotting with Deep Neural Networks
Rohit Prabhavalkar, Raziel Alvarez, Carolina Parada, Preetum Nakkiran, and Tara Sainath
ICASSP 2015.
About Me
For talks, you can use this [bio].I did my undergrad in EECS at UC Berkeley. I'm broadly interested in theory and science. In the past, I have interned at OpenAI (with Ilya Sutskever) Google Research (with Raziel Alvarez), Google Brain (with Behnam Neyshabur, Hanie Sedghi), and have also done research in error-correcting codes, distributed storage, and cryptography. I am grateful for past support from NSF GRFP and the Google PhD Fellowship. An (outdated) CV is available here. [ORCID]
See also my old website for more. This version borrowed in part from Luca Trevisan and Jon Barron.
What People are Saying
a "high-level" scientist —colleague (ML)
makes plots and draws lines through them
—colleague (TCS)
has merits that outweigh flaws —reviewer 2
Selected Tweets
- on science for science's sake
- the "definitional obstacle" to DL theory
- the "Natural Distributions" obstacle to generalization
- resources on causality
- how "causally explaining generalization" is not even wrong
- traps of defining objects which don't exist
- measures of dependence between RVs
- complaints about DL that are not actually about DL
- complaints about "science of ML" missing from ICML
- on calibration and overparameterization