Weijie Su

Associate Professor
Wharton Statistics and Data Science Department
and, by courtesy, Departments of Mathematics, and Computer and Information Science
Co-Director
Penn Research in Machine Learning
University of Pennsylvania

Office: 411 Academic Research Building
Email: suw AT wharton DOT upenn DOT edu

Research Interests
(Long-term) Mathematical, compute-light approaches to understanding deep learning and AI
(Current) Statistical foundations of large language models, privacy-preserving machine learning, high-dimensional statistics, mathematical optimization

Editorial Board
Journal of Machine Learning Research
Journal of the American Statistical Association
Operations Research
Foundations and Trends® in Statistics
Journal of the Operations Research Society of China
Communications in Mathematical Sciences

We’re hiring a postdoc focused on the theoretical/statistical foundations of large language models. Drop me an email if you’re interested in this position.

Recent Featured Papers

Do Large Language Models (Really) Need Statistical Foundations? W. Su.
Statistical Impossibility and Possibility of Aligning LLMs with Human Preferences: From Condorcet Paradox to Nash Equilibrium. K. Liu, Q. Long, Z. Shi, W. Su, and J. Xiao.
The 2020 United States Decennial Census Is More Private Than You (Might) Think. B. Su, W. Su, and C. Wang.
A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation and Blackwell's Theorem. W. Su.
A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules. X. Li, F. Ruan, H. Wang, Q. Long, and W. Su.
A Law of Next-Token Prediction in Large Language Models. H. He and W. Su.

News

May 2025. Our paper on the ICML ranking experiment will appear as a Discussion Paper in JASA.
May 2025. I was elected to Fellow of IMS.
May 2025. I'll serve as the Program Chair for the ASA Text Analysis Section starting January 1, 2026.
March 2025. We offered (with Qi Long and Inyoung Choi) a short course on large language models at ENAR 2025.
January 2025. Our third ranking experiment (with Buxin Su) was successfully conducted at ICML 2025. See more details in OpenRank.
October 2024. Our work (with Buxin Su and Chendi Wang) on the 2020 United States Decennial Census was featured by New Scientist.
August 2024. We offered (with Emily Getzen and Linjun Zhang) a short course on large language models at JSM in Portland, Oregon.
July 2024. Our work (with Jinshuo Dong and Aaron Roth) on $f$-differential privacy received the International Congress of Basic Science (ICBS) Frontiers of Science Award in Mathematics.