Late last month, Utah’s Medical Licensing Board called for the immediate suspension of the state’s pilot program with the AI company Doctronic. The program lets a chatbot evaluate patients and recommend prescription renewals for nearly 200 chronic condition drugs, with the state planning to phase out physician review of each case.
The board said that it only learned about the pilot after it had launched. Its warning was blunt: proceeding without proper clinical oversight “potentially places Utah citizens at risk.”
The backlash was predictable, and avoidable. Utah is one of at least 47 states now considering more than 250 bills governing clinical AI, producing a patchwork of rules on bias audits, payment policy, and patient consent. All this while the federal government’s main tool for regulating medical software, the Food and Drug Administration’s device-approval process, is structurally unfit for regulating autonomous clinical AI. It was built for static products like imaging algorithms, not for adaptive systems that keep improving.
This comes at a time where the country faces a worsening physician shortage. National projections show shortfalls of tens of thousands of doctors over the next decade, especially in primary care and rural areas. Traditional fixes such as more medical school seats and more residency slots will take years to make a meaningful difference. Artificial intelligence offers a faster path, but it requires a regulatory framework that fits.
In two years, large language models have gone from barely passing medical licensing exams to performing comparably to physicians on complex clinical reasoning. A 2025 prospective study of nearly 40,000 primary care visits in Kenya found that AI-supported clinicians made substantially fewer diagnostic and treatment errors. In the NOHARM trial published in December, a head-to-head comparison on routine clinical tasks, doctors did not beat the strongest large language models on any measured dimension. For a meaningful share of cognitive work in primary care, like reviewing records, taking histories, generating differential diagnoses, managing chronic disease, autonomous AI can already deliver clinically adequate care.
The constraint now is regulatory. The FDA’s framework assumes a product that is fixed at the moment of approval. Generative AI doesn’t work that way: New capabilities, new failure modes, and refreshed training data arrive with every model update. Predetermined change-control plans help at the margin, but they still require one-time certification and lockdown periods between updates. The framework cannot keep pace.
State-level fragmentation poses another problem. California bars insurers from using AI to deny coverage based on medical necessity. Colorado mandates bias assessments for high-risk systems. Each new law adds complexity without resolving the core question: How do we know an autonomous AI system is competent to practice medicine? The result is sluggish deployment, higher compliance costs for developers, and uneven access for patients in the communities AI could help most — especially rural areas that suffer from acute shortage of care providers.
The Utah controversy actually points toward an answer. The medical board’s central objection was about sequence: Utah deployed the system first and brought clinical oversight in afterward. State officials emphasize that Phase 1 currently includes physician review of every prescription, but the program’s design phases that review out once volume and safety benchmarks are met, moving first to retrospective audits and then to random-sample checks.
An AI that has demonstrated competency before practicing, through testing, supervised deployment, and ongoing monitoring, is a different proposition. A credentialing model built on those principles already exists for physicians, nurse practitioners, and physician assistants.
In a Viewpoint published April 29 in JAMA, my colleagues Robert Wachter, Ezekiel Emanuel, and I propose adapting that model for autonomous clinical AI, meaning systems that make care determinations without per-case clinician review.
The framework has four core elements.
1. Demonstrated competency. Every autonomous AI model would have to perform at or above the median score of recent human test-takers on the USMLE and any specialty boards relevant to its intended scope, and then enter a supervised clinical deployment phase, analogous to residency, during which it would demonstrate noninferior performance on real patients, at scale.
2. A defined scope of practice. Licensure would specify which conditions, settings, and tasks the AI is authorized to handle independently, and when it must escalate to a human clinician.
3. Ongoing monitoring with periodic renewal. Authorization would be time-limited, perhaps biennial, and contingent on continuous real-world performance tracking. Models that drift below standards lose their license.
4. Federal preemption with layered accountability. A new federal Office of Clinical AI Oversight within the Department of Health and Human Services would be created to certify competency. An act of Congress would be required to transfer the authority of regulating autonomous clinical AI from the FDA to this office. Developers would bear primary responsibility for model performance; deploying institutions would be responsible for workflow integration, supervision protocols, and adverse-event reporting. States would retain authority over scope of practice, supervision rules, and enforcement, but could not impose duplicative competency assessments.
Three main objections should be addressed upfront.
1. State control. Medical licensure has traditionally been a state function. Autonomous AI crosses state lines instantly, much like telemedicine. A national competency standard reduces administrative duplication and ensures that a patient in Utah and a patient in Mississippi receive equivalent safety guarantees.
2. Equivalence with physicians. Licensing AI does not equate it with a doctor. Clinicians will remain essential for complex judgment, moral reasoning, and the human elements of care that no language model replicates. Licensure simply acknowledges what the trial evidence already shows: For defined, lower-risk tasks, a well-regulated AI can practice safely.
3. Implementation capacity. HHS does not currently have the infrastructure to evaluate clinical AI at scale. Developer user fees, modeled on the FDA’s, can build it. Such fees have funded drug and device review since 1992.
More states will face their own Doctronic moments. Without a coherent federal framework, patients in underserved areas will keep waiting for care that AI could safely deliver, while states cycle through ad hoc deployments and predictable backlash. Congress should authorize an Office of Clinical AI Oversight built on the licensure principles above, before 49 more states have to find out the hard way.
Alon Bergman, Ph.D., is an assistant professor of medical ethics and health policy at the Perelman School of Medicine at the University of Pennsylvania, where he studies provider behavior, medical technology adoption, and access to care.