Ashwinee Panda

I am a Postdoctoral Fellow at UMD working with Tom Goldstein and his amazing students on large language models. My research is generously supported by a series of grants from Open Philanthropy.

I received my PhD from Princeton University working with Prateek Mittal. During my PhD, I received the OpenAI Superalignment Fast Grant for our work showing that current safety alignment is shallow, which received the Outstanding Paper Award at ICLR 2025.

I am currently on the job market.

CV  /  Google Scholar  /  Twitter  /  Github

profile photo

News

  • Nov 2025: I received a $310,000 grant from Open Philanthropy to build pretraining datasets.
  • Oct 2025: I received a $310,000 grant from Open Philanthropy to build reasoning models.
  • Sep 2025: 3 papers accepted to NeurIPS 2025: Dense Backprop, Gemstones, and FineGRAIN.
  • Jul 2025: 2 papers accepted to COLM 2025: Refusal Tokens and LoRI.
  • Jun 2025: I received a $218,000 grant from Open Philanthropy to study encoded reasoning.
  • Apr 2025: ICLR 2025 Outstanding Paper Award for Safety Alignment.
  • Apr 2025: I received a $150,000 grant from Longview to study the construction of pretraining datasets.

Research

The goal of my research is to build useful systems. To me, a useful system (1) does what you want it to do, (2) is personalized to you, and (3) can be developed and deployed by you. These desiredata inform the three focuses of my research agenda: safety (the ability to control the behavior of the system), privacy (the ability to personalize the system to you), and efficiency (the ability to develop and deploy the system quickly and cheaply).

See below for selected publications.

My research in safety focuses on understanding how users can control the behavior of systems. In Shallow Alignment, we show how the methods by which users leverage control over their systems -prompting, prefilling, modifying sampling parameters, and finetuning the model- can easily remove the alignment of the system. In Refusal Tokens, we show how to calibrate multiple kinds of refusal messages. In DynaGuard, we show how to to redefine safety as a dynamic process that can be controlled by users.

deep alignment Safety Alignment Should be Made More Than Just a Few Tokens Deep
Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson
At ICLR 2025 (Outstanding Paper Award)
paper / code

We analyze safety alignment and show that it is largely shallow. We propose methods for making it deeper.

refusal tokens Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models
Neel Jain, Aditya Shrivastava, Chenyang Zhu, Daben Liu, Alfy Samuel, Ashwinee Panda, Anoop Kumar, Micah Goldblum, Tom Goldstein
At COLM 2025
paper / models

Refusal tokens enable controlling a single model's refusal rates without further fine-tuning, by selectively intervening during generation.

dynaguard DynaGuard: A Dynamic Guardian Model With User-Defined Policies
Monte Hoover, Vatsal Baherwani, Neel Jain, Khalid Saifullah, Joseph Vincent, Chirag Jain, Melissa Kazemi Rad, C. Bayan Bruss, Ashwinee Panda, Tom Goldstein
Arxiv 2025
paper / models

DynaGuard is a dynamic guardian model that evaluates text based on user-defined policies, surpassing static models in detection accuracy.

VLM Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi*, Kaixuan Huang*, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal
At AAAI 2024 (Oral)
paper / code

We propose the first method for generating visual adversarial examples that can serve as transferrable universal jailbreaks against aligned large language models.

neurotoxin Neurotoxin: Durable Backdoors in Federated Learning
Zhengming Zhang*, Ashwinee Panda*, Linyue Song, Yaoqing Yang, Prateek Mittal, Joseph Gonzalez, Kannan Ramchandran, Michael Mahoney
In ICML 2022 (Spotlight)
paper / poster / code

Neurotoxin is a novel model poisoning attack for federated learning that stays present in the system for up to 5X longer than the baseline attack.

sparsefed SparseFed: Mitigating Model Poisoning Attacks in Federated Learning via Sparsification
Ashwinee Panda, Saeed Mahloujifar, Arjun Bhagoji, Supriyo Chakraborty, Prateek Mittal
In AISTATS 2022
paper / code

SparseFed is a provably robust defense against model poisoning attacks in federated learning that uses server-side sparsification to avoid updating malicious neurons.

privacy auditing Privacy Auditing of Large Language Models
Ashwinee Panda*, Xinyu Tang*, Milad Nasr, Christopher A. Choquette-Choo, Prateek Mittal
At ICLR 2025
paper / thread

We present the first method for doing privacy auditing of LLMs.

gpt phish Teach LLMs to Phish: Stealing Private Information from Language Models
Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, Prateek Mittal
At ICLR 2024
talk / paper / thread

We propose a new practical data extraction attack that we call "neural phishing". This attack enables an adversary to target and extract sensitive or personally identifiable information (PII), e.g., credit card numbers, from a model trained on user data.

dp zo Private Fine-tuning of Large Language Models with Zeroth-order Optimization
Xinyu Tang*, Ashwinee Panda*, Milad Nasr*, Saeed Mahloujifar, Prateek Mittal
At TMLR 2025, TPDP 2024 (Oral)
paper

We propose the first method for performing differentially private fine-tuning of large language models without backpropagation. Our method is the first to provide a nontrivial privacy-utility tradeoff under pure differential privacy.

dp icl Privacy-Preserving In-Context Learning for Large Language Models
Tong Wu*, Ashwinee Panda*, Tianhao Wang*, Prateek Mittal
At ICLR 2024
talk / paper / code / thread

We propose the first method for performing differentially private in-context learning. Our method generates sentences from in-context learning while keeping the in-context exemplars differentially private, that can be applied to blackbox APIs (ex RAG).

linear scaling A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization
Ashwinee Panda*, Xinyu Tang*, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal
At ICML 2024
talk / paper / code

We find that using scaling laws for Differentially Private Hyperparameter Optimization significantly outperforms prior work in privacy and compute cost.

dp random priors Differentially Private Image Classification by Learning Priors from Random Processes
Xinyu Tang*, Ashwinee Panda*, Vikash Sehwag, Prateek Mittal
At NeurIPS 2023 (Spotlight)
paper / code

We pretrain networks with synthetic images that have strong performance on downstream private computer vision tasks.

dp diffusion Differentially Private Generation of High Fidelity Samples From Diffusion Models
Vikash Sehwag*, Ashwinee Panda*, Ashwini Pokle, Xinyu Tang, Saeed Mahloujifar, Mung Chiang, J Zico Kolter, Prateek Mittal
At ICML 2023 GenAI Workshop
paper / poster

We generate differentially private images from non-privately trained diffusion models by analyzing the inherent privacy of stochastic sampling.

Dense Backpropagation Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Therien, Supriyo Chakraborty, Tom Goldstein
At NeurIPS 2025
paper / code / thread

We present a lightweight approximation method that gives the MoE router a dense gradient update while continuing to sparsely activate its parameters.

gemstones Gemstones 💎: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda♠️, Tom Goldstein
At NeurIPS 2025
paper / code / models / website / thread

We release the Gemstone model suite for open-source scaling laws.

LoRI LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation
Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
At COLM 2025
paper / models

LoRI reduces cross-task interference in multi-task low-rank adaptation, enabling better performance when fine-tuning on multiple tasks.

Video Diffusion Attention Analysis of Attention in Video Diffusion Transformers
Yuxin Wen, Jim Wu, Ajay Jain, Tom Goldstein, Ashwinee Panda
Arxiv 2025
paper / website / thread

We conduct an in-depth analysis of attention in video diffusion transformers (VDiTs) and report a number of novel findings.

Attention Sink Dormant Heads Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs
Pedro Sandoval-Segura, Xijun Wang, Ashwinee Panda, Micah Goldblum, Ronen Basri, Tom Goldstein, David Jacobs
Arxiv 2025
paper / thread

We propose a new definition for attention heads dominated by attention sinks, known as dormant attention heads.

lota Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal
At ICML 2024 WANT (Best Paper) / ES-FoMO (Oral)
paper / code / thread

Lottery Ticket Adaptation (LoTA) is a new adaptation method that handles challenging tasks, mitigates catastrophic forgetting, and enables model merging across different tasks.

fetchsgd FetchSGD: Communication-Efficient Federated Learning with Sketching
Daniel Rothchild*, Ashwinee Panda*, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora
In ICML 2020
paper / code

FetchSGD is a communication-efficient federated learning algorithm that compresses gradient updates with sketches.

softpbt SoftPBT: Leveraging Experience Replay for Efficient Hyperparameter Schedule Search
Ashwinee Panda, Eric Liang, Richard Liaw, Joey Gonzalez
paper / code

Not Research

WeChat  /  LinkedIn  /  Instagram  /  Yelp  /  Goodreads /  Spotify

I've lived all over America and traveled across the world. Feel free to ask me for food recs in NYC, LA, the SF Bay Area, or the DMV.

While at Berkeley I founded DiscreetAI (now inactive), a venture-backed startup building privacy-preserving machine learning as-a-service. You can check out our ProductHunt launch or our GitHub for more information. Among other things we won the first YCombinator Hackathon and built federated learning solutions for Fortune 500 companies.

I read a lot and post my thoughts (mostly now on Twitter, previously on Goodreads).

  • 2025; Twitter thread of 60 books read.
  • 2024; Twitter thread of 23 books read.
  • 2023; ELO rating tierlist (72 books).
  • 2021; Goodreads year in review (100 books).

  • Website template from Jon Barron.