Ashwinee Panda

I am a Postdoctoral Fellow at UMD working with Tom Goldstein on LLMs.

I received my PhD from Princeton University working with Prateek Mittal. During my PhD, I received the OpenAI Superalignment Fast Grant for our work showing that current safety alignment is shallow, which received the Outstanding Paper Award at ICLR 2025.

If you are interested in working with me, send me a DM on Twitter.

CV / Google Scholar / Twitter / Github

Research

I am currently working on a number of topics in LLMs. Right now I'm trying to understand how reasoning improves LLMs. Another big direction is in identifying what elements of the LLM pipeline are bottlenecking our scaling growth. A lot of my research is conducted through the lens of sparsity. I was a lead organizer for the ICLR 2025 workshop on Sparsity in LLMs.

	Safety Alignment Should be Made More Than Just a Few Tokens Deep Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson At ICLR 2025 (Outstanding Paper Award) paper / code We analyze safety alignment and show that it is largely shallow. We propose methods for making it deeper.
	Dense Backpropagation Improves Training for Sparse Mixture-of-Experts Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Therien, Supriyo Chakraborty, Tom Goldstein NeurIPS 2024 Workshops paper / code / thread We present a lightweight approximation method that gives the MoE router a dense gradient update while continuing to sparsely activate its parameters.
	Analysis of Attention in Video Diffusion Transformers Yuxin Wen, Jim Wu, Ajay Jain, Tom Goldstein, Ashwinee Panda Arxiv 2025 paper / website / thread We conduct an in-depth analysis of attention in video diffusion transformers (VDiTs) and report a number of novel findings.
	Privacy Auditing of Large Language Models Ashwinee Panda, Xinyu Tang, Milad Nasr, Christopher A. Choquette-Choo, Prateek Mittal At ICLR 2025 paper / thread We present the first method for doing privacy auditing of LLMs.
	Gemstones 💎: A Model Suite for Multi-Faceted Scaling Laws Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda♠️, Tom Goldstein At ICLR 2025 Sci-FM paper / code / models / website / thread We release the Gemstone model suite for open-source scaling laws.
	Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs Pedro Sandoval-Segura, Xijun Wang, Ashwinee Panda, Micah Goldblum, Ronen Basri, Tom Goldstein, David Jacobs Arxiv 2025 paper / thread We propose a new definition for attention heads dominated by attention sinks, known as dormant attention heads.
	Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal At ICML 2024 WANT (Best Paper) / ES-FoMO (Oral) paper / code / thread Lottery Ticket Adaptation (LoTA) is a new adaptation method that handles challenging tasks, mitigates catastrophic forgetting, and enables model merging across different tasks.
	Private Fine-tuning of Large Language Models with Zeroth-order Optimization Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek Mittal At TMLR 2025, TPDP 2024 (Oral)* paper We propose the first method for performing differentially private fine-tuning of large language models without backpropagation. Our method is the first to provide a nontrivial privacy-utility tradeoff under pure differential privacy.
	A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization Ashwinee Panda, Xinyu Tang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal At ICML 2024 talk / paper / code We find that using scaling laws for Differentially Private Hyperparameter Optimization significantly outperforms prior work in privacy and compute cost.
	Teach LLMs to Phish: Stealing Private Information from Language Models Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, Prateek Mittal At ICLR 2024 talk / paper / thread We propose a new practical data extraction attack that we call "neural phishing". This attack enables an adversary to target and extract sensitive or personally identifiable information (PII), e.g., credit card numbers, from a model trained on user data.
	Privacy-Preserving In-Context Learning for Large Language Models Tong Wu, Ashwinee Panda, Tianhao Wang, Prateek Mittal At ICLR 2024* talk / paper / code / thread We propose the first method for performing differentially private in-context learning. Our method generates sentences from in-context learning while keeping the in-context exemplars differentially private, that can be applied to blackbox APIs (ex RAG).
	Visual Adversarial Examples Jailbreak Aligned Large Language Models Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal At AAAI 2024 (Oral) paper / code We propose the first method for generating visual adversarial examples that can serve as transferrable universal jailbreaks against aligned large language models.
	Differentially Private Image Classification by Learning Priors from Random Processes Xinyu Tang, Ashwinee Panda, Vikash Sehwag, Prateek Mittal At NeurIPS 2023 (Spotlight) paper / code We pretrain networks with synthetic images that have strong performance on downstream private computer vision tasks.
	Differentially Private Generation of High Fidelity Samples From Diffusion Models Vikash Sehwag, Ashwinee Panda, Ashwini Pokle, Xinyu Tang, Saeed Mahloujifar, Mung Chiang, J Zico Kolter, Prateek Mittal At ICML 2023 GenAI Workshop paper / poster We generate differentially private images from non-privately trained diffusion models by analyzing the inherent privacy of stochastic sampling.
	Neurotoxin: Durable Backdoors in Federated Learning Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Prateek Mittal, Joseph Gonzalez, Kannan Ramchandran, Michael Mahoney In ICML 2022 (Spotlight) paper / poster / code Neurotoxin is a novel model poisoning attack for federated learning that stays present in the system for up to 5X longer than the baseline attack.
	SparseFed: Mitigating Model Poisoning Attacks in Federated Learning via Sparsification Ashwinee Panda, Saeed Mahloujifar, Arjun Bhagoji, Supriyo Chakraborty, Prateek Mittal In AISTATS 2022 paper / code SparseFed is a provably robust defense against model poisoning attacks in federated learning that uses server-side sparsification to avoid updating malicious neurons.
	FetchSGD: Communication-Efficient Federated Learning with Sketching Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora In ICML 2020 paper / code FetchSGD is a communication-efficient federated learning algorithm that compresses gradient updates with sketches.
	SoftPBT: Leveraging Experience Replay for Efficient Hyperparameter Schedule Search Ashwinee Panda, Eric Liang, Richard Liaw, Joey Gonzalez paper / code