I am a Postdoctoral Fellow at UMD working with Tom Goldstein and his amazing students on large language models. My research is generously supported by a series of grants from Open Philanthropy.
Jun 2025: I received a $218,000 grant from Open Philanthropy to study encoded reasoning.
Apr 2025: ICLR 2025 Outstanding Paper Award for Safety Alignment.
Apr 2025: I received a $150,000 grant from Longview to study the construction of pretraining datasets.
Research
The goal of my research is to build useful systems. To me, a useful system (1) does what you want it to do, (2) is personalized to you, and (3) can be developed and deployed by you. These desiredata inform the three focuses of my research agenda: safety (the ability to control the behavior of the system), privacy (the ability to personalize the system to you), and efficiency (the ability to develop and deploy the system quickly and cheaply).
See below for selected publications.
My research in safety focuses on
understanding how users can control the behavior of systems.
In Shallow Alignment,
we show how the methods by which users leverage control over their systems
-prompting, prefilling, modifying sampling parameters, and finetuning the model-
can easily remove the alignment of the system.
In Refusal Tokens,
we show how to calibrate multiple kinds of refusal messages.
In DynaGuard,
we show how to to redefine safety as a dynamic process that can be controlled by users.
My research in privacy is focused toward
the goal of being able to personalize the system to user data, a primary obstacle
to which is potential privacy violations. The two sides to my work are attacks and defenses.
In Neural Phishing and Privacy Auditing, we develop new attacks to upper bound
how much information can be extracted from the system. In the rest of my work, we develop
efficient methods for adapting models to user data with differential privacy.
My research in efficiency focuses on
building systems that can be developed and deployed by users. In LoTA and LoRI,
we work on parameter-efficient methods for adapting models to user data. In Gemstones and
Dense Backprop, we investigate how to shape models for efficient pretraining.
We propose the first method for generating visual adversarial examples that can serve as transferrable universal jailbreaks against aligned large language models.
SparseFed is a provably robust defense against model poisoning attacks in federated learning that uses server-side sparsification to avoid updating malicious neurons.
We propose a new practical data extraction attack that we call "neural phishing". This attack enables an adversary to target and extract sensitive or personally identifiable information (PII), e.g., credit card numbers, from a model trained on user data.
We propose the first method for performing differentially private fine-tuning of large language models without backpropagation. Our method is the first to provide a nontrivial privacy-utility tradeoff under pure differential privacy.
We propose the first method for performing differentially private in-context learning. Our method generates sentences from in-context learning while keeping the in-context exemplars differentially private, that can be applied to blackbox APIs (ex RAG).
We find that using scaling laws for Differentially Private Hyperparameter Optimization significantly outperforms prior work in privacy and compute cost.
Lottery Ticket Adaptation (LoTA) is a new adaptation method that handles challenging tasks, mitigates catastrophic forgetting, and enables model merging across different tasks.