Research

A Reasoning-Focused Legal Retrieval Benchmark, ACM CS&LAW (2025, forthcoming) (with Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, Michal Skreta, Christopher D. Manning, and Peter Henderson)

AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County (with Faiz Surani, Mirac Suzgun, Vyoma Raman, Peter Henderson and Christopher D. Manning).

Considerations for governing open foundation models, 386 Science 151 (2024) (with Rishi Bommasani, Sayash Kapoor, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Daniel Zhang, Marietje Schaake, Arvind Narayanan, and Percy Liang).

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models, Arxiv (2024) (with Andy K. Zhang, Neil Perry et al.)

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, J. Empirical Legal Stud. (2025, forthcoming) (with Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun and Christopher D. Manning)

Statistical Uncertainty in Word Embeddings: GloVe-V, EMNLP (forthcoming, 2024) (with Andrea Vallebueno, Cassandra Handan-Nader, and Christopher D. Manning)

Locating and Measuring Marine Aquaculture Production with Remote Sensing: A Computer Vision Approach in the French Mediterranean, Sci. Advances (2024, forthcoming) (with Andrea Vallebueno, Sebastian Quaade, Olivia Alcabes, and Kit Rodolfa).

Drop a Line, Submit on Time? Randomized Tailored Reminders Improve Pollution Reporting Timeliness, J. Ass’n Envtl. & Resource Econ. (2024, forthcoming) (with Elinor Benami, Nathanael Jo, and Elizabeth S. Ragnauth).

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models, 16 J. Legal Analysis 64 (2024) (with Matthew Dahl, Varun Magesh, and Mirac Suzgun).

Governing by Assignment, 173 U. Pa. L. Rev. 157 (2024) (with Isaac Cui, Anne Joseph O’Connell, and Olivia Martin).

Quantifying the Uncertainty of Imputed Demographic Disparity Estimates: The Dual-Bootstrap, NBER (2024) (with Benjamin Lu, Jia Wan, Derek Ouyang, and Jacob Goldin).

Corpus Enigmas and Contradictory Linguistics: Tensions between Empirical Semantic Meaning and Judicial Interpretation, 25 Minn. J.L. Sci. & Tech. 127 (2024) (with Peter Henderson, Andrea Vallebueno, and Sandy Handan-Nader).

Not (Officially) in My Backyard: Characterizing Informal Accessory Dwelling Units and Informing Housing Policy with Remote Sensing, J. Am. Planning Ass’n (2024) (with Nathanael Jo, Andrea Vallebueno, and Derek Ouyang).

On the Societal Impact of Open Foundation Models, ICML (2024) (with Sayash Kappoor, Rishi Bommasani et al.).

The Spectrum of AI Integration: The Case of Benefits Adjudication, in AI: Legal Issues, Policy, and Practical Strategies (2024, forthcoming) (with Olivia Martin, Faiz Surani, Kit Rodolfa, Amy Perez, and Daniel E. Ho).

Mapping Poultry Operations at Scale, AI for Good: Applications in Sustainability, Humanitarian Action, and Health (Juan M. Lavista Ferres & William B. Weeks eds. 2024) (with Caleb Robinson).

Regulating AI Adaptation: An Analysis of AI Medical Device Updates, 248 AHLI Proc. Health, Inference & & Learning 477 (CHIL) (2024) (with Kevin Wu, Eric Wu, Kit Rodolfa, and James Zou).

Limitations of Reporting Requirements under California’s Livestock Antimicrobial Restriction Law, Envtl. Health Persp. (2024) (with Sebastian Quaade, Joan A. Casey, Keeve E. Nachman, and Sara Y. Tartof).

Estimating and Implementing Conventional Fairness Metrics With Probabilistic Protected Features, SaTML (2024, forthcoming) (with Hadi Elzayn, Emily Black, Patrick Vossler, Nathanael Jo, and Jacob Goldin).

Measuring and Mitigating Racial Disparities in Tax Audits, Quarterly Journal of Economics (2024, forthcoming) (with Hadi Elzayn, Evelyn Smith, Cameron Guage, Thomas Hertz, Arun Ramesh, Robin Fisher, and Jacob Goldin)

AI Regulation Has Its Own Alignment Problem: The Technical and Institutional Feasibility of Disclosure, Registration, Licensing, and Auditing, Geo. Wash. L. Rev. (2024, forthcoming) (with Neel Guha, Christie M. Lawrence, Lindsey A. Gailmard, Kit T. Rodolfa, Faiz Surani, Rishi Bommasani, Inioluwa Deborah Raji, Mariano-Florentino Cuéllar, Colleen Honigsberg, and Percy Liang)

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models, NeurIPS D&B (2023, forthcoming) (with Neel Guha, Julian Nyarko, Chris Ré et al.)

Potential for Allocative Harm in an Environmental Justice Data Tool, Nature Machine Intelligence (2023) (with Benjamin Q. Huynh, Elizabeth T. Chin, Allison Koenecke, Derek Ouyang, Mathew V. Kiang, and David H. Rehkopf)

Leveraging Genomic Sequencing Data to Evaluate Disease Surveillance Strategies, iScience (2023) (with Benjamin Anderson, Derek Ouyang, Alexis D’Agostino, Brandon Bonin, Emily Smith, Vit Kraushaar, and Sarah L. Rudman)

Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools, EAAMO (2023) (with Emily Black, Rakshit Naidu, Rayid Ghani, Kit T. Rodolfa, Hoda Heidari)

Estimation of Racial Disparities When Race is Not Observed, Working Paper (2023) (with Cory McCartan, Jacob Goldin, and Kosuke Imai)

Implications of Predicting Race Variables from Medical Images, Science (2023) (with James Zou, Judy Wawira Gichoya, and Ziad Obermeyer)

The Bureaucratic Challenge to AI Governance: An Empirical Assessment of Implementation at U.S. Federal Agencies, ACM AI, Ethics & Soc’y (2023, forthcoming), earlier version: Implementation Challenges to Three Pillars of America’s AI Strategy, HAI-RegLab White Paper (2022) (with Christie Lawrence and Isaac Cui)

How Redundant are Redundant Encodings? Blindness in the Wild and Racial Disparity when Race is Unobserved, ACM FAccT (2023) (with Lingwei Cheng, Isabel Gallegos, Derek Ouyang, and Jacob Goldin)

The Privacy-Bias Tradeoff: Data Minimization and Racial Disparity Assessments in U.S. Government, ACM FAccT (2023) (with Arushi Gupta, Helen Webley-Brown, Victor Wu, and Jen King)

Integrating Social Services with Disease Investigation: A Randomized Trial of COVID-19 High-Touch Contact Tracing, PLOS ONE (2023) (with Lisa Lu, Derek Ouyang, Alexis D’Agostino, Angelica Diaz, and Sarah L. Rudman)

Integrating Water Quality Data with a Bayesian Network Model to Improve Spatial and Temporal Phosphorus Attribution: Application to the Maumee River Basin, 360 Journal of Environmental Management 121120 (2024) (with Zihan Wei, Sarfaraz Alam, Miki Verma, Margaret Hilderbran, Yuchen Wu, Brandon Anderson, and Jenny Suckale)

Automated vs. Manual Case Investigation and Contact Tracing for Pandemic Surveillance: Evidence from a Stepped Wedge Cluster Randomized Trial, Lancet: eClinicalMedicine (2022) (with Cameron Raymond, Derek Ouyang, Alexis D’Agostino, and Sarah L. Rudman)

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset, NeurIPS (2022, forthcoming) (with Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, and Dan Jurafsky)

Detecting Environmental Violations with Satellite Imagery in Near Real Time: Land Application under the Clean Water Act, ACM CIKM (2022) (with Ben Chugg, Nicolas Rothbacher, Alex Feng, and Xiaoqi Long)

Beyond Ads: Sequential Decision-Making Algorithms in Law and Public Policy, ACM CSLAW 87 (2022) (with Peter Henderson, Ben Chugg, and Brandon Anderson)

Entropy Regularization for Population Estimation, ACM AAAI (2023) (with Peter Henderson, Ben Chugg, and Jacob Goldin)

How to Build Academic-Public Health Partnerships: The Stanford – Santa Clara County Experience with COVID-19 Response, in Build Me the Evidence (Tamar Bauer ed. 2022, forthcoming) (with Sara H. Cody)

Advances, Challenges and Opportunities in Creating Data for Trustworthy AI, Nature: Machine Intelligence (2022) (with Weixin Liang, Girmaw Abebe Tadesse, Fei-Fei Li, Matei Zaharia, Ce Zhang, and James Zou)

Algorithmic Fairness and Vertical Equity: Income Fairness with IRS Tax Audit Models, ACM FAccT 1479 (2022) (with Emily Black, Hadi Elzayn, Alexandra Chouldechova, and Jacob Goldin)

Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit Selection, ACM AAAI (2023) (with Peter Henderson, Ben Chugg, Brandon Anderson, Kristen Altenburger, Alex Turk, John Guyton, and Jacob Goldin)

Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance, ACM AI, Ethics & Soc’Y (2022, forthcoming) (with Inioluwa Deborah Raji, Peggy Xu, and Colleen Honigsberg)

Can Transportation Subsidies Reduce Failures to Appear in Criminal Court? Evidence from a Pilot Randomized Controlled Trial, 216 Economics Letters 110540 (2022) (with Rebecca Brough, Matthew Freedman, and David C. Phillips)

Designing Accountable Health Care Algorithms: Lessons from Covid-19 Contact Tracing, New England Journal of Medicine: Catalyst (2022) (with Lisa Lu, Alexis D’Agostino, Sarah L. Rudman and Derek Ouyang)

Science Translation During the COVID-19 Pandemic: An Academic-Public Health Partnership to Assess Capacity Limits in California, 112 American Journal of Public Health 308 (2022) (with Peter Maldonado, Angie Peng, Derek Ouyang, and Jenny Suckale)

Mapping Industrial Poultry Operations at Scale with Deep Learning and Aerial Imagery, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2022) (with Caleb Robinson, Ben Chugg, Brandon Anderson, Juan M. Lavista Ferres)

Executive Control of Agency Adjudication: Capacity, Selection and Precedential Rulemaking, Journal of Law, Economics, and Organization (2024, forthcoming) (with David Hausman, Mark Krass, and Anne McDonough)

A Language-Matching Model to Improve Equity and Efficiency of COVID-19 Contact Tracing, 118 PNAS (2021) (with Lisa Lu, Benjamin Anderson, Raymond Ha, Alexis D’Agostino, Sarah L. Rudman, and Derek Ouyang)

Artificial Intelligence for Adjudication: The Social Security Administration and AI Governance, Oxford Handbook on AI Governance (2022, forthcoming) (with Kurt Glaze, Gerald Ray, and Christine Tsang)

Evaluation of Allocation Schemes of COVID-19 Testing Resources in a Community-Based Door-to-Door Testing Program, JAMA Health Forum 2(8):e212260 (2021) (with Ben Chugg, Lisa Lu, Derek Ouyang, Benjamin Anderson, Raymond Ha, Alexis D’Agostino, Anandi Sujeer, Sarah L. Rudman, and Analilia Garcia)

Building a National AI Research Resource: A Blueprint for the National Research Cloud (2021), 3 Notre Dame J. Emerging Tech. 71 & HAI White Paper (with Jen King, Russell Wald & Chris Wan)

On the Opportunities and Risks of Foundation Models (2021) (with Rishi Bommasani, Percy Liang et al.)

Enhancing Environmental Enforcement with Near Real-Time Monitoring: Likelihood-Based Detection of Structural Expansion of Intensive Livestock Farms, 103 International Journal of Applied Earth Observation and Geoinformation 102463 (2021) (with Ben Chugg, Brandon Anderson, Seiji Eicher, and Sandy Lee)

Improving the Reliability of Food Safety Disclosure: Restaurant Grading in Seattle-King County, 84 Journal of Environmental Health 30 (2021) (with Zoe C. Ashwood and Becky Elias)

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings, ICAIL Proceedings 159 (2021) (with Lucia Zheng, Neel Guha, Peter Henderson, and Brandon Anderson)

Context-Aware Legal Citation Recommendation using Deep Learning, ICAIL Proceedings 79 (2021) (with Zihan Huang, Charles Low, Mengqiu Teng, Hongyi Zhang, Mark Krass, and Matthias Grabmair)

How Medical AI Devices Are Evaluated: Limitations and Recommendations from an Analysis of FDA Approvals, 27 Nature Medicine 582 (with Eric Wu, Kevin Wu, Roxana Daneshjou, David Ouyang, and James Zou)

Temporal Cluster Matching for Change Detection of Structures from Satellite Imagery, ACM COMPASS 138 (2021) (with Caleb Robinson, Anthony Ortiz, Juan M. Lavista Ferres and Brandon Anderson)

Disparate Limbo: How Administrative Law Erased Antidiscrimination, 131 Yale Law Journal 370 (2021) (with Cristina Isabel Ceballos and David Freeman Engstrom)

Mandatory Retirement and Age, Race, and Gender Diversity of University Faculties, 23 American Law and Economics Review 100 (2021) (with Oluchi Mbonu and Anne McDonough)

The Distributive Effects of Risk Prediction in Environmental Compliance: Algorithmic Design, Environmental Justice, and Public Policy, ACM FAccT 90 (2021) (with Elinor Benami, Reid Whitaker, Vincent La, Hongjin Lin, and Brandon R. Anderson)

How U.S. Law Will Evaluate Artificial Intelligence for Covid-19, 372 BMJ n.234 (2021) (with Mark Krass, Peter Henderson, Michelle M. Mello, and David M. Studdert)

Leveraging Administrative Data for Bias Audits: Assessing Disparate Coverage with Mobility Data for COVID-19 Policy, ACM FAccT 173 (2021) (with Amanda Coston, Neel Guha, Derek Ouyang, Lisa Lu, and Alexandra Chouldechova)

Deep Learning with Satellite Imagery to Enhance Environmental Enforcement, in Data Science Applied to Sustainability Analysis (2021) (with Sandy Handan-Nader and Larry Y. Liu)

Affirmative Algorithms: The Legal Grounds for Fairness as Awareness, University of Chicago Law Review Online (2020) (with Alice Xiang)

Evaluating Facial Recognition Technology: A Protocol for Performance Assessment in New Domains, 98 Denv. L. Rev. 753 (2021), Stanford HAI White Paper (with Emily Black, Maneesh Agrawala, and Li, Fei-Fei)

Feasible Policy Evaluation by Design: A Randomized Synthetic Stepped-Wedge Trial in King County, Evaluation Review (2020) (with Cassandra Handan-Nader and Becky Elias)

Algorithmic Accountability in the Administrative State, Yale Journal of Regulation (2020) (with David Freeman Engstrom)

Improving Scientific Judgments in Law and Government: A Field Experiment of Patent Peer Review, 17 Journal of Empirical Legal Studies 190-223 (2020) (with Lisa Larrimore Ouellette)

Menu Labeling, Calories, and Nutrient Density: Evidence from Chain Restaurants, 15 PLoS ONE 1-16 (2020) (with Oluchi Mbonu, Anne McDonough, and Rebecca Pottash)

The Effectiveness of a Neighbor-to-Neighbor Get-Out-the-Vote Program: Evidence from the 2017 Virginia State Elections, Journal of Experimental Political Science (2021) (with Cassandra Handan-Nader, Alison Morantz, and Tom Rutter)

Government by Algorithm: Artificial Intelligence in Federal Administrative Agencies, Report to the Administrative Conference of the United States (2020) (with David Freeman Engstrom, Catherine M. Sharkey, and Mariano-Florentino Cuéllar)

Artificially Intelligent Government: A Review and an Agenda, in Big Data Law (Roland Vogl ed. 2021) (with David Freeman Engstrom)

Did Restaurant Hygiene Grading in Los Angeles Immediately Reduce Foodborne Illness by 20% Across All of Southern California? A Response to Jin & Leslie, SIEPR Working Paper (2019) (with Cassandra Handan-Nader)

Deep Learning to Map Concentrated Animal Feeding Operations, 2 Nature Sustainability 298 (2019) (with Cassandra Handan-Nader) (Supplemental Information)

Quality Review of Mass Adjudication: A Randomized Natural Experiment at the Board of Veterans Appeals, 2003-16, 35 Journal of Law, Economics, and Organization 239 (2019) (with Cassandra Handan-Nader, David Ames, and David Marcus)

Due Process and Mass Adjudication: Crisis and Reform, 72 Stanford Law Review 1 (2020) (with David Ames, Cassandra Handan-Nader, and David Marcus)

Is Yelp Actually Cleaning Up the Restaurant Industry? A Re-Analysis on the Relative Usefulness of Consumer Reviews, WWW Proceedings (2019) (with Kristen M. Altenburger)

Making Street-Level Bureaucracy Work: Safer Food in Seattle and King County in Evidence Works: Cases Where Evidence Meaningfully Informed Policy (Nick Hart & Meron Yohannes eds. 2019) (with Becky Elias)

When Algorithms Import Private Bias into Public Enforcement: The Promise and Limitations of Statistical Debiasing Solutions, 175 Journal of Institutional and Theoretical Economics 98-122 (with Kristen M. Altenburger)

New Evidence on Information Disclosure through Restaurant Hygiene Grading, 11 American Economic Journal: Economic Policy 404-28 (2019) (with Zoe C. Ashwood and Cassandra Handan-Nader) (Online Appendix)

Does Peer Review Work? An Experiment of Experimentalism, 69 Stanford Law Review 1-119 (2017)

Do Checklists Make a Difference? A Natural Experiment from Food Safety Enforcement, Journal of Empirical Legal Studies (2018) (with Sam Sherman and Phil Wyman)

Judging Statistical Criticism, Observational Studies (2017)

Managing Street-Level Arbitrariness: The Evidence Base for Public Sector Quality Improvement, 13 Annual Review of Law and Social Science 251-72 (2017) (with Sam Sherman)

Equity in Bureaucracy, Irvine Law Review (2017)

New Measurement Technologies: A Review and Application to Nuremberg and Justice Jackson, Oxford Handbook of Law and the Judiciary (2017) (with Michael Morse)

Testing the Marketplace of Ideas, 90 New York University Law Review 1160-1228 (2015) (with Fred Schauer)

Randomizing . . . What? A Field Experiment of Child Access Voting Laws, 171 Journal of Institutional and Theoretical Economics 150-70 (2015)

Does Class Size Affect the Gender Gap? A Natural Experiment in Law, 43 Journal of Legal Studies 291-321 (2014) (with Mark G. Kelman)

Foreword: Conference Bias, 10 Journal of Empirical Legal Studies 603-11 (2013)

Introduction: The Empirical Revolution in Law, 65 Stanford Law Review 1195-1202 (2013) (with Larry Kramer)

Do Police Reduce Crime A Reexamination of a Natural Experiment, Empirical Legal Analysis: Assessing Performance of Legal Institutions (2013) (with John J. Donohue III and Patrick Leahy)

Fudging the Nudge: Information Disclosure and Restaurant Grading, 122 Yale Law Journal 574-688 (2012)

Credible Causal Inference for Empirical Legal Studies, 7 Annual Review of Law and Social Science 17-40 (2011) (with Donald B. Rubin)

MatchIt: Nonparametric Preprocessing for Parametric Causal Inference, 42 Journal of Statistical Software 1-28 (with Kosuke Imai, Gary King, and Elizabeth A. Stuart)

Did a Switch in Time Save Nine?, 2 Journal of Legal Analysis 69-113 (2010) (with Kevin M. Quinn)

Did Liberal Justices Invent the Standing Doctrine? An Empirical Study of the Evolution of Standing, 1921-2006, 62 Stanford Law Review 591-667 (2010) (with Erica L. Ross)

How Not to Lie with Judicial Votes: Misconceptions, Measurement, and Models, 98 California Law Review 813-76 (2010) (with Kevin M. Quinn)

Reconciling Punitive Damages Evidence: Comment, 166 Journal of Institutional and Theoretical Economics 27-32 (2010)

Measuring Agency Preferences: Experts, Voting, and the Power of Chairs, 59 DePaul Law Review 333-70 (2010)

Viewpoint Diversity and Media Consolidation: An Empirical Study, 61 Stanford Law Review 781-868 (2009) (with Kevin M. Quinn)

The Role of Theory and Evidence in Media Regulation and Law: Response to Baker and a Defense of Empirical Legal Studies, 61 Federal Communications Law Journal 673-713 (2009) (with Kevin M. Quinn)

Evaluating Course Evaluations: An Empirical Analysis of a Quasi-Experiment at the Stanford Law School, 2000-2007, 58 Journal of Legal Education 388-412 (2009) (with Timothy H. Shapiro)

Measuring Explicit Political Positions of Media, 3 Quarterly Journal of Political Science 353-77 (2008) (with Kevin M. Quinn)

Estimating Causal Effects of Ballot Order from a Randomized Natural Experiment: The California Alphabet Lottery, 1978-2002, 72 Public Opinion Quarterly 216-40 (2008) (with Kosuke Imai)

Improving the Presentation and Interpretation of Online Ratings Data with Model-based Figures, 62 American Statistician 279-88 (2008) (with Kevin M. Quinn)

Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference, 15 Political Analysis 199-236 (2007) (with Kosuke Imai, Gary King, and Elizabeth A. Stuart)

The Impact of Damage Caps on Malpractice Claims: Randomization Inference with Difference-in-Differences, 4 Journal of Empirical Legal Studies 69-102 (2007) (with John J. Donohue III).

Randomization Inference with Natural Experiments: An Analysis of Ballot Effects in the 2003 California Recall Election, 101 Journal of the American Statistical Association 888-900 (2006) (with Kosuke Imai)

The Effect of War on the Supreme Court, Principles and Practice of American Politics (Samuel Kernell and Steven S. Smith eds., 3d ed. 2006) (with Lee Epstein, Gary King, and Jeffrey A. Segal)

Why Affirmative Action Does Not Cause Black Students to Fail the Bar, 114 Yale Law Journal 1997-2004 (2005)

Affirmative Action’s Affirmative Actions: A Reply to Sander, 114 Yale Law Journal 2011-16 (2005)

The Supreme Court During Crisis, 80 New York University Law Review 1-116 (2005) (with Lee Epstein, Gary King, and Jeffrey A. Segal)

Compliance with International Soft Law: Why Do Countries Implement the Basle Accord?, 5 Journal of International Economic Law 647-88 (2002)