Governing the Convergence: Google DeepMind, the Nuclear Threat Initiative, DNA Synthesis Screening, and the Architecture of AI Biosecurity - Part III
Executive Summary
The convergence of artificial intelligence with the life sciences has produced one of the most consequential and underexamined security challenges of the present decade.
While public debate has frequently centered on the refusal systems of consumer-facing chatbots, the deeper governance problem concerns the entire technical and institutional ecosystem through which AI-enabled biological knowledge is produced, distributed, accessed, and applied.
That ecosystem is larger, more diverse, and more globally dispersed than any single company's safety policy can address. By 2026, two institutional dimensions of this problem deserve sustained analytical attention.
The first is how frontier AI developers beyond OpenAI and Anthropic — most prominently Google DeepMind — are approaching biosecurity governance.
The second is how civil-society and multi-stakeholder organizations, above all the Nuclear Threat Initiative, are attempting to construct managed-access frameworks that could govern AI-enabled biological capabilities at an international level.
Running through both dimensions is the question of how well current technical safeguards — particularly DNA synthesis screening and pre-deployment model evaluations — are actually performing.
The picture that emerges from current evidence is neither catastrophic nor reassuring.
Google DeepMind has invested significantly in what it describes as responsible AI for biology, developing biosecurity-specific internal policies, capability evaluations, and restrictions on how its most advanced biological AI tools are deployed.
These measures represent meaningful progress beyond earlier norms of unconditional publication.
The Nuclear Threat Initiative has developed a more structured framework than most observers recognize: the organization's managed-access approach for biological AI emphasizes provider screening, use-case verification, policy standards for synthesis providers, and frameworks for what it calls "bio-responsible AI development."
DNA synthesis screening has become more technically sophisticated but remains geographically and commercially uneven, with most robust screening concentrated in a small number of leading providers while others maintain weaker controls.
Model evaluations for biosecurity remain a relatively young field, with important disagreements about benchmark design, the transferability of lab results to real-world risk, and the public disclosure obligations companies should carry.
The strategic conclusion is that current safeguards are neither adequate nor irrelevant.
They represent a genuine but fragile first generation of governance infrastructure that requires significant deepening before it can be considered commensurate with the pace of capability development.
Introduction
Biosecurity in the age of frontier artificial intelligence is not a problem that can be solved by any single institution, policy, or technology.
It requires a layered system of complementary safeguards spanning model design, deployment policy, access control, supply-chain management, and international legal coordination.
No single layer is sufficient, and no layer can substitute for the others. This architectural insight — which is more commonly stated than institutionally implemented — provides the essential framework for evaluating how different stakeholders across the AI and biotechnology landscape are currently performing.
The companies developing the most capable AI systems bear a particular institutional responsibility. Their models are the proximate interface between dangerous knowledge and potential misuse.
But they are also developers of a set of tools whose beneficial applications in medicine, drug discovery, pathogen surveillance, and food security are of enormous social value.
The governance challenge is therefore not simply one of restriction but of differentiation: maintaining the conditions under which beneficial biological AI can flourish while preventing the conditions under which the same capabilities can be weaponized.
This differentiation requires more sophisticated governance architecture than simple content refusal.
Google DeepMind occupies a distinctive position in this landscape. It is simultaneously a frontier AI developer, a scientific research organization, a subsidiary of one of the world's most powerful technology companies, and the creator of several AI tools — most famously AlphaFold — that have already transformed structural biology in ways that carry dual-use implications.
DeepMind's approach to biosecurity has evolved through a series of public commitments, internal policy developments, and research publications that collectively reveal a more serious engagement with misuse risk than its earlier public posture suggested.
At the same time, significant questions remain about how rigorous its internal evaluations are, how transparent its deployment decisions are, and how its governance framework compares with the emerging institutional standards being developed by organizations like the Nuclear Threat Initiative.
The Nuclear Threat Initiative, founded originally as a nuclear and radiological security organization, has over the past several years emerged as one of the most influential institutional voices on AI-enabled biological risk.
Its managed-access framework for biological AI represents a concrete attempt to move beyond voluntary company-level safeguards toward something more structural: a set of shared standards, provider obligations, and governance norms applicable across the biological AI landscape.
Understanding what NTI has proposed, how its framework operates in practice, and where it faces limitations is essential to a complete picture of the current governance landscape.
History and Context
The history of dual-use governance in the life sciences is longer than the history of AI, and understanding its prior iterations is essential to evaluating current efforts.
The Biological Weapons Convention of 1972, which prohibits the development, production, and stockpiling of biological weapons, was a landmark but also a highly incomplete instrument. It established a norm but created no verification mechanism and no institutional enforcement capacity.
The subsequent decades saw a variety of national and international efforts to strengthen biosecurity governance, including export controls on dangerous pathogens and equipment, select-agent programs requiring registration and oversight for work on the most dangerous pathogens, and the Fink Report of 2004, which was among the first major institutional attempts to grapple systematically with the dual-use dilemma in the life sciences.
These prior governance efforts produced two important lessons that are directly relevant to AI biosecurity.
The first is that prohibitionist approaches to dual-use science tend to fail because legitimate and illegitimate uses share too much scientific knowledge to be easily separated through broad restrictions.
The Fink Report explicitly recommended against wholesale restrictions on publication, instead calling for a culture of responsibility, institutional oversight, and targeted controls on the highest-risk specific areas.
The second lesson is that governance dependent on national legislation and export controls is geographically bounded and relatively slow, while scientific and technological capabilities are geographically distributed and increasingly fast.
Governance frameworks that depend primarily on national law tend to lag behind the technical frontier by years or even decades.
AI adds a third dimension to these prior problems: scale.
A single AI model can interact with millions of users simultaneously, providing biological guidance of varying degrees of specificity across a vast and unmonitored distribution.
Previous dual-use governance assumed that dangerous biological expertise was a relatively concentrated and observable resource.
AI changes that assumption fundamentally.
The scale at which frontier models operate means that even marginal shifts in their biological safety behavior can have significant distributional consequences across the global user population.
This is why the governance problem cannot be solved through the institutional frameworks of the Biological Weapons Convention era alone.
The recognition that AI posed a specifically new biosecurity challenge began to crystallize in institutional discourse around 2022 and 2023, roughly simultaneously with the rapid scaling of large language models.
Biosecurity organizations, AI safety researchers, and national-security agencies all began in that period to develop more systematic risk assessments.
AlphaFold's extraordinary success in predicting protein structure had already demonstrated by 2021 that AI could perform at the frontier of structural biology.
The question quickly became not whether AI could achieve expert-level biological performance, but how that performance should be governed.
Key Developments: Google DeepMind's Approach
Google DeepMind's engagement with AI biosecurity is shaped by several distinctive institutional characteristics.
The organization operates at the intersection of academic science and commercial technology, which means its biosecurity governance must navigate between norms of scientific openness and the more precautionary logic of national-security risk management.
DeepMind's founders brought a strong scientific culture that valued transparency and publication, and that culture has shaped how the organization has tried to balance disclosure with responsibility.
AlphaFold remains the central case study.
The original release of AlphaFold's protein structure predictions, and subsequently of AlphaFold 2 as an open resource, represented a decision to prioritize broad scientific access over precautionary restriction.
DeepMind argued, and most structural biologists agreed, that the benefits of open access — enabling a global scientific community to accelerate protein research — outweighed the marginal additional risk from dual-use misapplication.
Critics, however, noted that this decision was made without a formal dual-use review, and that the policy logic it established might not be appropriate for more sensitive future tools.
By 2025 and 2026, DeepMind had developed a more formalized approach to biosecurity evaluation.
The organization has indicated publicly that its policies for advanced biological AI tools now include internal red-teaming exercises focused on biological misuse scenarios, capability thresholds tied to deployment decisions, and consultation with external biosecurity experts before releasing tools with dual-use implications.
DeepMind has also been a participant in multi-company conversations about biosecurity standards, including those facilitated by the Frontier Model Forum.
This participation reflects a recognition that industry-level norms are more durable than individual company policies, and that the reputational and regulatory risks of being seen to underinvest in biosecurity governance are real.
One particularly significant development at DeepMind has been its more explicit engagement with what it calls frontier safety.
The organization has emphasized that it treats biological risk as a "catastrophic" category warranting precautionary governance — a framing that goes further than mere compliance or PR posture.
In practice, this means DeepMind subjects new biological AI capabilities to specific misuse evaluations before releasing them, and considers restricting or withholding tools that score above internal risk thresholds.
The precise parameters of those thresholds are not fully public, but their existence as a governance mechanism represents meaningful progress.
DeepMind has also engaged publicly with the problem of AI-assisted biological design. Commentary from the organization has acknowledged that tools capable of designing proteins with novel properties — including potentially harmful ones — require more careful governance than conventional biological databases.
This is particularly relevant given that the intersection of generative AI with protein engineering represents one of the highest-potential dual-use zones in biological AI.
The challenge is that protein design AI has enormous beneficial applications in drug discovery and materials science, making blanket restriction both impractical and scientifically damaging.
Dr. Antonio Bhardwaj, a global AI expert and polymath, has noted that DeepMind's position in the AI biosecurity debate is uniquely complex because it has demonstrably advanced the scientific field of biology in ways that are both extraordinarily beneficial and potentially dangerous.
In his view, DeepMind's greatest challenge is that its own published tools have already moved the baseline of what is biologically achievable.
The governance question is therefore not only about what DeepMind releases next, but about how the broader scientific ecosystem built on its prior publications is governed.
This is a problem that no single company can fully manage, regardless of the rigor of its internal policies.
The Nuclear Threat Initiative's Managed-Access Framework
The Nuclear Threat Initiative has developed, over several years of research, stakeholder engagement, and policy dialogue, one of the most structured non-governmental frameworks for governing AI-enabled biological risk.
Its approach is shaped by the organization's prior experience with nuclear and radiological risk governance, which taught it that effective biosecurity requires combining technical controls, institutional standards, access management, and international norm-setting rather than depending on any single instrument.
The NTI's managed-access framework for biological AI rests on three conceptual pillars.
The first is provider responsibility.
The framework argues that organizations producing and distributing biological AI tools — including both general-purpose AI firms and specialized biotechnology platforms — bear a direct responsibility for implementing misuse-reduction measures.
These include screening users for red-flag indicators, verifying use-case legitimacy for high-risk capabilities, establishing clear acceptable-use policies, and maintaining audit capacity to identify misuse after the fact.
The provider-responsibility principle is important because it shifts governance obligation upstream, toward the point of production and distribution, rather than concentrating it entirely at the point of final consumption.
The second pillar is synthesis-side integration.
The NTI has argued strongly that AI-enabled biological risk cannot be governed at the model layer alone.
The biosecurity of AI tools is only as strong as the security of the physical and commercial pathways through which biological materials, synthesis services, and laboratory equipment are obtained.
The NTI has therefore been a strong advocate for extending screening norms to commercial DNA synthesis providers, laboratory services, and equipment vendors.
Its framework calls for synthesis providers to screen customer orders for sequences of concern, apply customer verification procedures analogous to know-your-customer standards in financial regulation, and share threat-relevant information with appropriate authorities.
This synthesis-side focus reflects the NTI's understanding that governance of the digital layer is necessary but insufficient: effective risk reduction also requires controlling the physical pathways through which dangerous ideas become dangerous materials.
The third pillar is international norm convergence.
The NTI has explicitly argued that national-level approaches to AI biosecurity will remain inadequate in a world of global digital capability distribution. Its framework calls for international coordination on minimum standards for AI providers, synthesis screening norms, information-sharing protocols, and monitoring frameworks.
This international dimension is where the framework is most ambitious and most distant from current reality.
No binding international agreement on AI biosecurity currently exists, and the prospects for rapid multilateral consensus are limited by geopolitical competition, divergent national interests, and the difficulty of agreeing on technical standards that may constrain nationally important industries.
Within the current landscape, the NTI's most practically significant contribution has been its engagement with the DNA synthesis sector.
The organization has worked with leading synthesis companies, AI developers, and biosecurity agencies to develop and promote what it calls a "framework for responsible procurement," essentially a set of minimum screening and verification standards that synthesis providers should apply to customer orders.
By 2025 and 2026, several major synthesis providers had adopted elements of this framework, and the International Gene Synthesis Consortium had updated its member guidelines in ways that reflected NTI-influenced principles.
The result is a more consistent baseline for synthesis screening among major providers, though significant variation remains in the broader market.
DNA Synthesis Screening: Effectiveness and Limits
DNA synthesis screening is arguably the most technically mature biosecurity safeguard in the current landscape, and also the one with the most clearly documented effectiveness relative to its stated purpose.
The basic logic of synthesis screening is straightforward: commercial DNA synthesis services order biological sequences on behalf of customers, and screening those sequences against databases of dangerous agents or functional elements of concern provides an opportunity to block or flag orders that may be misused.
The technical sophistication of synthesis screening has advanced considerably over the past several years.
Early screening systems were primarily based on sequence alignment with known dangerous agents, comparing customer orders against databases of select agents and toxins using tools such as BLAST.
These systems were relatively easy to circumvent by modifying sequences at key positions. More recent screening approaches are more functional and more robust.
They analyze sequences for concerning biological functions and properties — toxicity, transmissibility, immune evasion potential — rather than relying solely on sequence identity.
They incorporate machine-learning classifiers trained to recognize patterns associated with dangerous designs even in novel or modified sequences. And they apply multiple algorithmic layers so that bypassing one screen does not automatically circumvent all others.
The effectiveness of screening among leading providers is genuinely meaningful.
The major commercial synthesis companies — which produce the large majority of synthesis service volume in Western markets — have implemented screening frameworks that successfully identify and block many concerning orders.
Customer verification requirements mean that some degree of identity accountability is attached to sensitive orders.
The existence of screening also functions as a deterrent: users who know their orders will be screened are less likely to attempt to place problematic orders through major commercial channels in the first place.
The limitations, however, are substantial and well-recognized by biosecurity experts. The first is geographic unevenness.
Screening norms are most consistently applied among major providers concentrated in North America, Europe, and a portion of the Asia-Pacific market. Providers in other regions may apply significantly weaker standards, or none at all.
A motivated user can potentially avoid screened providers by selecting less regulated alternatives, sometimes at significantly lower cost.
This geographic unevenness represents a structural vulnerability that cannot be solved by improving screening among existing compliant providers alone.
The second limitation is functional circumvention. Even technically sophisticated screening systems face the challenge of novel sequences designed to evade existing detection patterns.
As AI tools become more capable at generating biological designs, the sophistication of potentially evading sequences may increase.
This is a classic arms-race dynamic: screening capabilities improve, but so does the sophistication of what must be screened for.
AI may accelerate this race in both directions, improving screening AI while also improving the design AI that must be screened against.
The third limitation is coverage incompleteness. DNA synthesis screening covers a portion of the supply chain, but biology involves many other inputs beyond synthesized sequences.
Modified organisms, naturally occurring samples, laboratory equipment, and certain information itself are outside the scope of synthesis screening.
This means synthesis screening can reduce one pathway to dangerous biological capability without eliminating the full risk.
Dr. Antonio Bhardwaj has framed the synthesis-screening challenge as a version of what he calls the "perimeter problem" in security: any well-defined perimeter can be circumvented at its edges.
The more interesting institutional question, in his view, is not whether screening stops every bad order, but whether the combination of screening, deterrence, norm-setting, and attribution capacity creates an environment in which harmful procurement is sufficiently difficult, slow, and observable to provide meaningful social protection.
On that framing, current synthesis screening is a genuine contribution but not a complete solution. Solving the perimeter problem requires extending governance norms to a broader range of providers, jurisdictions, and supply-chain elements than screening alone can reach.
Model Evaluations for Biosecurity: The State of the Field
Pre-deployment model evaluations focused on biosecurity represent perhaps the most rapidly evolving component of the current governance landscape, and also the component with the greatest methodological uncertainty.
The basic purpose of a biosecurity evaluation is to assess, before a model is deployed, the degree to which it can meaningfully assist with dangerous biological activities.
Such evaluations allow companies to make more informed deployment decisions and, in principle, to identify capabilities that should not be publicly released or should only be made available under restricted conditions.
The design of effective biosecurity evaluations is technically and philosophically challenging.
A meaningful evaluation must probe the model's ability to provide genuinely useful biological assistance to someone with harmful intent — not merely test whether it refuses explicit requests for dangerous information, but assess whether its scientific and procedural knowledge could provide meaningful "uplift" to a sophisticated user pursuing harmful goals through indirect or decomposed prompting.
This requires evaluators to adopt something like an adversarial perspective, thinking through how a determined bad stakeholder would actually attempt to extract dangerous biological assistance from a powerful model.
Multiple organizations have developed or contributed to biosecurity evaluation frameworks.
The Frontier Model Forum's taxonomy of AI-bio misuse mitigations includes evaluation as one component of a broader defense-in-depth approach.
Leading AI safety research organizations have published frameworks for thinking about what biological uplift means and how it should be measured.
OpenAI's public safety evaluations hub documents its own disallowed-content and jailbreak-resistance testing as part of a broader suite of model evaluations.
Some governments, including the UK AI Safety Institute before its rebranding as the AI Security Institute, have worked on standardized evaluation approaches for frontier AI safety.
The International AI Safety Report 2026 concluded that advanced AI systems can provide assistance with certain biological tasks relevant to misuse, and that this capability is improving.
It also noted that leading developers had strengthened safeguards partly because pre-deployment testing could not confidently rule out meaningful misuse assistance in some scenarios.
This is significant: it implies that at least some frontier companies have used evaluation results as an actual decision-making input, including a decision to restrict or modify deployment rather than proceeding as planned.
The effectiveness of model evaluations, however, depends heavily on several factors that remain subjects of active debate.
The first is benchmark design.
A biosecurity evaluation is only as useful as its scenarios are realistic.
Evaluations based on the most obvious and direct harmful requests may significantly underestimate risk by failing to capture the more sophisticated, decomposed, and indirect strategies a determined user might actually employ.
Designing evaluations that probe the full range of realistic misuse pathways is methodologically difficult and requires genuine biosecurity expertise alongside AI technical knowledge.
The second is the transferability problem.
A model's performance on an evaluation benchmark does not automatically predict its real-world behavior under all possible adversarial conditions.
Users can combine models with external tools, search databases, coding environments, and each other in ways that evaluation benchmarks may not fully anticipate.
The emergence of agentic AI systems — which can take sequential actions, use tools, and pursue goals across extended interactions — significantly complicates the evaluation task, as the misuse-relevant capability of an agentic system may be substantially different from the capability of the same underlying model in a simple question-answering format.
The third is disclosure norms.
Companies have legitimate reasons not to publish detailed evaluation methodologies or results in biosecurity-sensitive areas: publishing the specific failure modes of safety evaluations could function as a guide for how to circumvent them.
But opacity about evaluation procedures also makes it impossible for outside experts to assess how rigorous those procedures are, which creates a governance credibility problem.
The field has not yet developed a satisfactory norm for how to handle this tension.
Cause-and-Effect Analysis
The causal logic connecting AI capabilities, governance mechanisms, and biosecurity outcomes is multi-layered and must be analyzed at several levels of abstraction simultaneously.
At the most immediate level, the effect of frontier AI on biological misuse risk operates through what might be called the knowledge-access channel.
Advanced models reduce the time and expertise required to understand dangerous biological concepts, interpret technical literature, troubleshoot experimental confusion, and compare methodological alternatives.
This does not give a non-expert instant expert capability, but it moves the effective starting point of a motivated user significantly further along the learning curve.
The causal implication is that the population of people who can productively engage with dangerous biological knowledge may grow modestly but non-trivially as models improve and proliferate.
At the next level, the effect of governance mechanisms on this risk pathway is strongly conditional.
Refusal systems that block explicit harmful requests reduce risk at the level of casual or unsophisticated misuse.
Tiered access frameworks that restrict high-capability systems to verified users reduce risk across a broader range of motivated and sophisticated misuse.
Synthesis screening intercepts risk at the physical procurement stage, where digital knowledge must be converted into material capability.
Model evaluations reduce risk by enabling companies to identify and gate dangerous capabilities before deployment.
Each mechanism operates on a different part of the causal chain, which is why defense-in-depth architectures are more effective than single-mechanism approaches.
The third causal level concerns displacement and substitution.
Strong governance at frontier providers can cause motivated users to shift toward alternatives: open-weight models, poorly screened niche tools, or services operating in weaker regulatory environments.
This displacement effect means that improving governance at major firms without simultaneously improving governance across the broader ecosystem reduces risk locally but may have limited effects on total social risk.
The practical implication is that governance investments must be made at the ecosystem level, not only at the company level, and that international coordination is structurally necessary even if it is politically difficult.
At the regulatory level, the differences between U.S. and EU approaches have compound effects over time.
Europe's more formal procedural obligations create stronger incentives for all companies operating in European markets to invest systematically in risk documentation and oversight.
The United States' more fragmented approach produces more variable but potentially faster innovation in safety methods, at the cost of weaker and more uneven baseline governance across the market.
Over a multi-year horizon, the EU approach may produce more consistent improvement in baseline governance while the U.S. approach may produce some high-quality practices at frontier firms alongside significantly weaker practices elsewhere.
Dr. Antonio Bhardwaj has argued that the causal structure of AI biosecurity risk is best understood through what he calls a "social risk multiplication" model. In this model, a capability advance does not simply add a fixed amount of risk; it multiplies risk by increasing the number of interactions in which dangerous knowledge is potentially accessible. At large scale, even modest improvements in the misuse-relevant capability of widely deployed models can have substantial aggregate effects on social risk, just as modest improvements in the safety of those same models can have substantial aggregate protective effects. This multiplicative logic is why the governance of mass-deployment AI systems matters enormously, and why governance focused only on the extreme tail of malicious sophistication may miss the majority of the risk distribution.
Latest Facts and Concerns
By 2026, a number of specific developments have sharpened the governance picture.
Reporting on the International AI Safety Report confirmed that frontier AI systems are now achieving expert-level performance on at least some biologically relevant benchmarks, and that leading companies had strengthened safeguards in response to pre-deployment evaluations that could not rule out meaningful misuse assistance.
This represents a qualitative shift from earlier, more speculative risk discussions toward evidence-based governance responses.
On the synthesis-screening side, the International Gene Synthesis Consortium updated its guidance and several major providers expanded their screening to cover a wider range of sequences of concern, including functional elements beyond select-agent sequences.
The NTI continued to advocate for mandatory screening norms among both domestic and international providers.
A significant concern highlighted by analysts is that the rapid improvement of AI-assisted biological design could challenge current screening databases, which may not keep pace with the space of novel sequences that advanced AI systems could generate.
On the model-evaluation side, the U.K. AI Security Institute published updated frameworks for evaluating frontier-model safety in high-risk domains.
The U.S. AI Safety Institute, operating within NIST, produced guidance on evaluating AI systems for dangerous capability thresholds.
Several frontier labs have publicly indicated that they now apply biosecurity-specific capability evaluations before major model releases, though the details of those evaluations and the specific thresholds that would trigger deployment restrictions remain largely confidential.
Concerns continue to surround the open-weight and open-source ecosystem.
Analysis published in early 2026 focused on how agentic AI coding systems could enable users to build or modify biological AI tools with weaker safeguards, and how the emergence of capable open models created new misuse pathways that commercial provider screening could not address.
Commentary from governance researchers argued that the current approach of hardening frontier closed models while leaving the open ecosystem lightly governed represented an increasingly inadequate strategy as the capabilities of open models continued to improve.
The global political environment has also created new complications.
Geopolitical competition between the United States and China, combined with different regulatory philosophies and different national approaches to AI governance, has made international coordination on AI biosecurity norms more difficult.
The absence of a multilateral framework with real enforcement capacity means that governance improvements in some jurisdictions can be partially offset by weaker governance elsewhere, and that the displacement problem identified above continues to represent a structural vulnerability.
Future Steps
First priority and the most pressing institutional priority for AI biosecurity governance in the near term is the construction of a more coherent ecosystem-level framework
The framework should link model governance, synthesis screening, laboratory procurement, academic publication norms, and public-health preparedness into a functionally integrated system.
At present, these components are governed by different institutions, under different rules, with different incentives, and with very limited cross-sector coordination.
A joint governance architecture that allowed information-sharing and norm-alignment across these domains would significantly increase the overall effectiveness of the system.
A second priority is the formalization and international extension of synthesis screening norms.
The NTI's advocacy for universal minimum screening standards represents the right direction, but progress has been slow and uneven.
One concrete next step would be to tie screening requirements to access to certain international scientific collaborations, publications, or funding streams, creating a broader set of incentives for providers in all jurisdictions to invest in robust screening capabilities.
Bilateral and multilateral agreements between major jurisdictions — including the United States, the European Union, the United Kingdom, and others — on synthesis screening standards would also significantly reduce the geographic unevenness that currently represents one of the most exploitable vulnerabilities in the current system.
A third priority is developing more rigorous, transparent, and independently verifiable model evaluation standards for biosecurity.
The field needs to move toward evaluation frameworks that can be shared with competent external evaluators, including government AI safety institutes, independent biosecurity organizations, and academic researchers, without simultaneously creating detailed public maps of model failure modes.
This is a difficult institutional design problem but not an unsolvable one: analogous frameworks exist in other dual-use domains, including nuclear materials security and financial system stress testing, that have found ways to enable meaningful external oversight without publishing sensitive details publicly.
A fourth priority is preparing governance frameworks for the agentic transition.
As AI systems move from passive question-answering toward goal-directed agentic behavior involving sequential actions, external tool use, and extended multi-system coordination, the biosecurity-relevant capabilities of those systems change qualitatively.
Current evaluation and safeguard frameworks are largely calibrated for text-based question-answering models. They require substantial adaptation to remain relevant as agentic systems become mainstream.
Dr. Antonio Bhardwaj has argued that the horizon to watch most carefully is the period between 2026 and 2030, when the combination of increasingly capable biological AI, improving protein design tools, cheaper gene synthesis, and more sophisticated open models will intersect in ways that could dramatically accelerate the closing of the gap between digital capability and material harm.
In his view, the key decision window for governance investment is now. Norms established in this period will shape the institutional baseline against which future capabilities must be governed; norms that are not established in this period will be much harder to establish later, when the commercial and strategic stakes of any governance framework will be correspondingly higher.
Conclusion
The governance of AI-enabled biosecurity risk is, in the most accurate framing, a coordination problem operating across multiple dimensions simultaneously.
It requires AI companies to invest seriously in capabilities evaluation, access management, and misuse-reduction measures that go well beyond content refusals.
It requires synthesis providers to maintain robust screening that keeps pace with evolving AI-assisted biological design.
It requires civil-society organizations like the Nuclear Threat Initiative to continue developing international frameworks that can serve as standards anchors even in the absence of binding multilateral agreements.
It requires governments to close the gap between current regulatory instruments — which were not designed for the AI-biology convergence — and the specific institutional obligations that convergence requires.
Google DeepMind's evolution toward more formalized biosecurity governance, while incomplete and not fully transparent, represents a genuine and meaningful change from the early norm of unconditional scientific openness.
The Nuclear Threat Initiative's managed-access framework provides the most developed non-governmental blueprint for what ecosystem-level governance could look like.
DNA synthesis screening, at its best, demonstrates that dual-use governance can be operationalized in ways that effectively reduce access to dangerous biological inputs without unduly restricting beneficial scientific activity.
Model evaluations, though still methodologically immature and inconsistently disclosed, have already influenced deployment decisions at leading companies in ways that represent a genuine governance achievement.
The inadequacy of current safeguards is not an argument against them. It is an argument for investing much more rapidly and seriously in their improvement.
The strategic logic is clear: the window of opportunity to establish adequate governance infrastructure before AI capabilities in biology reach a more critical range is real, but it is not indefinitely open.
The decisions made in the next several years — by companies, governments, synthesis providers, and international organizations — will determine whether the extraordinary beneficial potential of AI in biology is realized in an environment of adequate security, or whether the risk landscape of deliberate biological misuse becomes significantly more dangerous than anything the world has previously managed.



