Dan Hendrycks: Center for AI Safety Director
The Prophet from Missouri
In May 2023, a single-sentence statement appeared on the website of an obscure San Francisco nonprofit called the Center for AI Safety. The statement read: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."
Within days, more than 350 signatures had been collected. Not from fringe doomsday preppers, but from the most powerful people in artificial intelligence: Sam Altman of OpenAI. Demis Hassabis of Google DeepMind. Dario Amodei of Anthropic. Geoffrey Hinton, the "godfather of AI." Yoshua Bengio, one of the field's most-cited researchers. And hundreds of professors from MIT, Stanford, Berkeley, and Oxford.
The statement would eventually gather more than 500 signatures from AI executives, researchers, and policymakers. Major media outlets—The New York Times, Bloomberg, The Guardian—ran headlines about AI posing "extinction risk." Senators cited the statement in Congressional hearings. The European Union referenced it in AI regulation discussions.
Behind this extraordinary moment was a 28-year-old machine learning researcher who had completed his PhD just one year earlier. His name: Dan Hendrycks.
To understand how someone so young managed to reshape the global conversation about artificial intelligence—convincing some of the world's most skeptical technologists to publicly warn about human extinction—requires understanding three things: his unlikely journey from evangelical Missouri to Berkeley's computer science department, his track record of creating research that now powers ChatGPT and other frontier AI systems, and his systematic campaign to build an intellectual and institutional infrastructure around AI catastrophic risk.
But it also requires understanding the fierce backlash. Critics call Hendrycks an "AI doomer" spreading unfounded panic. They point to his $6.6 million in funding from effective altruism organizations. They question whether his dual roles—as a supposedly neutral safety researcher and as paid advisor to Elon Musk's xAI—represent a conflict of interest. And they argue that his apocalyptic predictions serve to consolidate power among a small group of AI companies by justifying heavy regulation that would prevent new competitors from emerging.
This is the story of how Dan Hendrycks became the most influential—and most controversial—voice in AI safety. It is a story about a brilliant researcher who created foundational AI technologies, then became convinced those technologies might destroy humanity. It is about how a movement born in the rationalist blogs of LessWrong entered the mainstream through sophisticated public relations campaigns. And it is about the high-stakes debate over whether AI poses an existential threat worthy of global coordination—or whether "AI doomerism" is a moral panic that will shape technology policy for decades to come.
The Making of an AI Safety Researcher
Dan Hendrycks was born in 1994 or 1995 and raised in Marshfield, Missouri, a town of 7,000 people in the heart of the Bible Belt. He grew up in a Christian evangelical household, an upbringing that would shape his moral framework even as he moved away from organized religion.
In high school, Hendrycks read Shelly Kagan's "The Limits of Morality," a philosophical work exploring the boundaries of moral obligation. The book profoundly affected him. According to his own account, it motivated him to "work exceptionally hard" by confronting him with questions about how much one person owes to humanity.
When Hendrycks left rural Missouri for the University of Chicago in 2014, he encountered the effective altruism movement—a philosophical and social movement that uses evidence and reason to determine the most effective ways to benefit others. He participated in 80,000 Hours, an EA-linked career advising program that helps people direct their careers toward maximizing positive impact.
It was through this program that Hendrycks met Bastian Stern, who would later work at Open Philanthropy, one of effective altruism's primary funding organizations. Stern advised Hendrycks to pursue artificial intelligence as a career path, arguing that working on AI safety could be one of the most impactful things a person could do to reduce existential risk.
The advice would prove pivotal. After completing his undergraduate degree at the University of Chicago in 2018, Hendrycks enrolled in UC Berkeley's Computer Science PhD program. He was advised by Dawn Song, a professor specializing in security and machine learning, and Jacob Steinhardt, a researcher focused on AI safety and robustness.
But Hendrycks's relationship with effective altruism remained complicated. While he credits the movement for directing his career toward AI safety, he has consistently denied being an EA member when asked by journalists. In interviews, he distances himself from the movement's more controversial aspects—its connection to the failed cryptocurrency exchange FTX, its ties to rationalist communities, and its utilitarian calculus that sometimes produces counterintuitive moral conclusions.
Yet the funding tells a different story. During his doctoral studies at Berkeley, Hendrycks received support from the NSF Graduate Research Fellowship Program and the Open Philanthropy AI Fellowship. His research was "co-funded by the Moskovitz Open Philanthropy Foundation," according to his CV. Later, the Center for AI Safety he would found received at least $6.6 million from Open Philanthropy—making it, by his own admission to the Boston Globe in 2023, his organization's "primary funder."
This tension—between Hendrycks's technical credibility as a researcher and his ideological and financial ties to effective altruism—would become a recurring theme in debates about his work.
Building the Infrastructure of Influence
During his PhD years at Berkeley from 2018 to 2022, Hendrycks produced research that would give him credibility far beyond the AI safety community. He did not merely write position papers about hypothetical risks. He created tools that the entire AI industry now depends on.
In 2016, as an undergraduate, Hendrycks co-authored a paper introducing GELU—Gaussian Error Linear Units—a new activation function for neural networks. Activation functions are mathematical operations that determine whether and how strongly a neuron in a neural network should fire. They are foundational building blocks, used billions of times in every AI model inference.
The GELU paper proposed a simple but powerful idea: instead of using ReLU (Rectified Linear Unit), the dominant activation function at the time, neural networks could use a smooth, probabilistic function that weights inputs by their magnitude rather than simply gating them by their sign. The technical innovation was subtle but consequential.
GELU became the most widely used activation function in state-of-the-art language models. It is used in BERT, Google's transformer model that revolutionized natural language understanding in 2018. It is used in GPT-2, GPT-3, and GPT-4—the models behind ChatGPT. It is used in Vision Transformers, the models that brought transformer architectures to computer vision. By 2025, the original GELU paper had accumulated over 5,800 citations on Semantic Scholar, and the activation function had become standard in virtually every major AI lab.
The irony was not lost on observers: the man warning about AI extinction risk had invented a core component of the very systems he feared.
But GELU was just the beginning. In 2020, Hendrycks led the creation of MMLU—Massive Multitask Language Understanding—a benchmark that would become the industry standard for evaluating large language models. MMLU contains 15,908 multiple-choice questions spanning 57 academic subjects including elementary mathematics, US history, computer science, law, and more. Crucially, the questions are designed to be at expert level—requiring college or professional knowledge rather than common sense.
When Hendrycks first released MMLU, GPT-3 scored around 43.9 percent accuracy, barely better than random guessing on a four-choice test. The benchmark's difficulty made it a powerful measuring stick for AI progress. As language models improved—GPT-3.5, GPT-4, Claude, Gemini—their MMLU scores became a key metric reported in every major model announcement.
The benchmark also included something unusual: a "Moral Scenarios" task drawing from Hendrycks's earlier work on the ETHICS dataset. These questions tested whether AI systems could predict human moral intuitions about everyday scenarios—whether they would approve or disapprove of various actions. Models performed poorly on these questions, scoring far worse than on pure knowledge tasks.
For Hendrycks, this was not just an academic curiosity. It was evidence of a critical gap: AI systems were rapidly gaining knowledge and reasoning capabilities, but they struggled to model human values and ethics. This gap would form the foundation of his argument that advanced AI posed catastrophic risks.
By the time Hendrycks completed his PhD in 2022, he had over 25,000 citations—an extraordinary number for someone just finishing their doctorate. His h-index, a measure of both productivity and impact, was among the highest for any researcher his age. He had created tools used by OpenAI, Google, Meta, and Anthropic. He had credibility with both academic researchers and industry practitioners.
That credibility would prove essential to what came next.
Founding the Center for AI Safety
In 2022, Dan Hendrycks and Oliver Zhang co-founded the Center for AI Safety. Zhang, who had worked at OpenAI and held degrees from Harvard and Stanford, brought operational expertise and Silicon Valley connections. Hendrycks brought technical credibility and a clear vision: to establish AI catastrophic risk as a legitimate field worthy of serious academic and policy attention.
The timing was deliberate. In 2022, AI was undergoing a renaissance. GitHub Copilot had demonstrated that AI could write working code. DALL-E 2 had shown that AI could generate photorealistic images from text descriptions. And rumors swirled about GPT-4, which insiders predicted would be dramatically more capable than GPT-3.
But the AI safety field remained fragmented and marginal. Most safety research focused on near-term issues: algorithmic bias, privacy violations, misinformation. The small community worried about existential risk—the possibility that advanced AI might cause human extinction or permanent disempowerment—was largely dismissed as science fiction by mainstream researchers.
Hendrycks set out to change that. The strategy was multi-pronged.
First, produce serious academic research. In June 2023, Hendrycks published "An Overview of Catastrophic AI Risks," a comprehensive paper cataloging four categories of extreme risks: malicious use, AI race dynamics, organizational risks, and rogue AI. The paper was technically rigorous, extensively cited, and written in the sober language of academic research rather than the hyperbolic style of science fiction. It defined "existential risks" not as any catastrophe, but specifically as "catastrophes from which humanity would be unable to recover"—including extinction and permanent dystopian scenarios.
The paper's academic framing helped legitimize catastrophic AI risk as a serious research topic. It was published in the Journal of Artificial Intelligence Research and presented at academic venues including Princeton's Program in Language and Intelligence.
Second, build the intellectual infrastructure. Hendrycks developed a course on machine learning safety at Berkeley and other universities. He launched the ML Safety Newsletter, a Substack publication covering recent developments in AI safety research. He became Editor-in-Chief of AI Frontiers, a publication funded by the Center for AI Safety. And in 2024, he published "Introduction to AI Safety, Ethics, and Society," a 568-page textbook that consolidated the field's fragmented knowledge into a comprehensive curriculum.
The textbook was significant. By creating standardized educational materials, Hendrycks was doing what every successful intellectual movement does: building pipelines for the next generation of researchers. Students could now take a structured course in AI safety, use a standard textbook, and have a clear path into the field.
Third, create benchmarks and research challenges. In September 2024, Hendrycks partnered with Scale AI to launch "Humanity's Last Exam," described as "the final closed-ended academic benchmark" for AI. The project was inspired by a conversation with Elon Musk, who complained that existing benchmarks like MMLU had become too easy—GPT-4 achieved around 86 percent accuracy, and models were approaching saturation.
Humanity's Last Exam was designed to be brutally difficult. Nearly 1,000 subject experts from 500+ institutions across 50 countries contributed questions. Out of 70,000 trial questions, only 3,000 made it into the final exam after expert review. The questions tested "capabilities at the frontier of human knowledge and reasoning"—the kinds of problems that would challenge PhD students and domain experts.
When the exam launched, leading AI models struggled. Most achieved less than 10 percent accuracy on the hardest questions, while simultaneously showing calibration errors greater than 80 percent—meaning they were highly confident in wrong answers. For Hendrycks, this demonstrated a critical point: models were becoming more confident and more persuasive while still making fundamental errors. As they approached human-level performance on easier tasks, the gap between their confidence and their actual capabilities on truly difficult problems was widening.
Fourth, engage directly with AI companies and policymakers. This was perhaps the most controversial aspect of Hendrycks's strategy.
The Dual Roles That Raised Questions
In July 2023, Elon Musk founded xAI, his answer to what he saw as the "woke" direction of OpenAI and the alignment problems he believed other labs were ignoring. One of his first moves was to appoint Dan Hendrycks as safety advisor.
The arrangement was unusual. Hendrycks would receive a symbolic salary of one dollar per year and hold no equity in the company—a structure designed to avoid conflicts of interest. According to Hendrycks's account, he had emailed Igor Babuschkin, one of xAI's senior employees, to ask how the company intended to approach AI safety. His graduate research and his public profile had given him credibility, and the recruitment proceeded from there.
But the appointment raised immediate questions. Hendrycks was the director of the Center for AI Safety, an organization that presented itself as an independent nonprofit focused on reducing AI risks. Now he was also a paid advisor—even if only for a dollar—to a for-profit company racing to build artificial general intelligence.
Critics pointed out the potential conflicts. If Hendrycks advised xAI on safety, would he advocate for regulations that might benefit xAI by slowing down competitors? Would his public statements about AI risk be influenced by his relationship with Musk, one of the most polarizing figures in technology? And would his access to xAI's internal safety work create awkward situations when evaluating or commenting on the company's practices?
Hendrycks defended the arrangement. In a 2021 blog post, he had predicted that Musk would "re-enter the fight to build safe advanced AI" in 2023—and Musk's founding of xAI proved that prediction correct. Hendrycks argued that having a voice inside a major AI lab was essential to ensuring safety was taken seriously. The nominal salary and zero equity stake, he maintained, meant he had no financial incentive to compromise his principles.
But in November 2024, Hendrycks added a second advisory role: he joined Scale AI, also for a symbolic one-dollar salary. Scale AI is a major data annotation company that has become essential infrastructure for training large language models. It also has deep ties to the US defense and intelligence community, working on contracts for the Department of Defense and intelligence agencies.
Scale AI's CEO, Alexandr Wang, had co-authored "Superintelligence Strategy" with Hendrycks and former Google CEO Eric Schmidt. The report outlined how the United States and China could "compete securely and safely" in developing increasingly capable AI systems. It synthesized national security imperatives, economic competitiveness, and AI governance into a framework calling for "urgent government action."
For critics, this looked like regulatory capture in real-time. Hendrycks was now advising two companies that would benefit from AI regulations that raised barriers to entry for smaller competitors. He was co-authoring policy reports with executives from those same companies. And his Center for AI Safety was issuing public statements about extinction risk that could justify exactly the kind of government interventions that would consolidate power among established players.
The criticism intensified when reporters examined the funding flows. Open Philanthropy, the Center for AI Safety's primary funder, had given Hendrycks's organization at least $6.6 million: a $5.16 million general support grant in 2022, additional funding for a philosophy fellowship, and ongoing support. Open Philanthropy was co-founded by Dustin Moskovitz, Facebook's co-founder, and is closely tied to effective altruism. Some of its other major grantees included Anthropic, an AI safety company founded by former OpenAI researchers.
A pattern emerged: effective altruism-funded organizations were simultaneously warning about existential AI risk, advocating for AI regulation, and advising or funding AI companies. Critics called it the "AI existential risk industrial complex"—a network of overlapping interests presenting catastrophic narratives that would benefit insiders.
The most pointed criticism came from Nirit Weiss-Blatt, author of "The AI Panic," and the Abundance Institute, a think tank skeptical of techno-pessimism. They argued that "AI doomerism" was a form of "criti-hype"—overconfident predictions of doom that mirrored the overconfident predictions of tech utopians. The media coverage, they argued, failed by promoting both extremes while ignoring more moderate voices.
Hendrycks's own statements provided ammunition for critics. In interviews and presentations, he consistently cited an 80 percent probability that AI would cause human extinction or permanent disempowerment. He argued that "evolutionary pressure will likely ingrain AIs with behaviors that promote self-preservation," warning that humanity could be "supplanted as Earth's dominant species." These were not careful conditional statements about possible futures under specific scenarios. They were confident predictions of likely doom.
In March 2024, the blog Emergent Behavior published a detailed rebuttal to Hendrycks, arguing against his concerns about military control of AI development and questioning his risk assessments. Rationalist communities, despite their own concerns about AI risk, criticized what they saw as Hendrycks conflating safety work with capabilities work—doing research that ostensibly improved AI safety but also made systems more powerful.
The Statement on AI Risk and Its Aftermath
The May 2023 Statement on AI Risk was both Hendrycks's greatest achievement and his most controversial moment. The single-sentence statement was carefully crafted for maximum consensus. It did not specify mechanisms of risk. It did not call for any specific policy action. It simply stated that extinction risk from AI should be treated as seriously as extinction risk from pandemics and nuclear war.
The list of signatories was extraordinary. Sam Altman of OpenAI signed, despite leading the company most aggressively scaling AI capabilities. Demis Hassabis of Google DeepMind signed, despite building what many considered the world's most advanced AI systems. Dario Amodei of Anthropic signed, as did Geoffrey Hinton and Yoshua Bengio, two of the three "godfathers of deep learning."
More than 100 professors of AI signed, including the field's most-cited computer scientists. The two living Turing Award laureates who worked in AI both signed. Executives from major tech companies signed alongside independent researchers and policymakers.
The statement's release was accompanied by a press push that demonstrated sophisticated communications strategy. Major media outlets had been briefed in advance. The statement appeared on the Center for AI Safety's website with an accompanying text explaining that "it is still difficult to speak up about extreme risks of AI" and that the statement aimed to "overcome this obstacle."
Within hours, The New York Times, Bloomberg, The Guardian, BBC, and dozens of other outlets ran stories. "AI Poses 'Risk of Extinction,' Industry Leaders Warn" was a typical headline. The coverage portrayed the statement as a breakthrough moment of consensus—a rare instance where competing AI labs agreed on the seriousness of the risks they were creating.
But closer examination revealed complications. Many signatories had different interpretations of what they had signed. Some believed the statement referred to risks from malicious use of AI—terrorists using AI to design bioweapons, authoritarian governments using AI for mass surveillance. Others believed it referred to risks from rogue AI agents that might pursue goals misaligned with human values. Still others thought it was simply a call for more safety research, not a statement about existential risk per se.
The statement had been deliberately vague to maximize consensus. But that vagueness meant different people were endorsing different things. When reporters pressed signatories about specific risks or policy recommendations, their answers diverged dramatically. Some advocated for government oversight of large AI training runs. Others opposed regulation entirely, arguing that innovation should not be constrained based on speculative risks.
Critics noted the timing. The statement came shortly after an open letter from the Future of Life Institute calling for a six-month moratorium on training AI systems more powerful than GPT-4. That letter, released in March 2023, had been dismissed by many researchers as technically naive and politically unworkable. The Statement on AI Risk, by contrast, made no specific demands—making it easier for industry leaders to sign without committing to any particular action.
Some observers saw this as strategic brilliance: Hendrycks had created a Schelling point where diverse actors could coordinate despite disagreeing on specifics. Others saw it as a rhetorical trick: extracting signatures for a vague statement about "extinction risk" that could later be cited to justify specific policy agendas the signatories might not actually support.
The statement's impact was undeniable. Within months, Congressional hearings featured senators citing it to question AI executives about extinction risks. The European Union's AI Act negotiations referenced concerns about existential risk. The UK government organized the first AI Safety Summit at Bletchley Park in November 2023, bringing together governments and AI companies to discuss catastrophic risks. South Korea hosted a second AI Safety Summit in 2024, and Singapore announced a third for 2025.
Whether these policy developments constituted progress or distraction became a central debate. Advocates argued that the statement had successfully elevated AI safety from a niche concern to a mainstream policy priority, creating space for serious governance discussions. Critics countered that the focus on speculative extinction scenarios was crowding out attention to concrete harms—algorithmic discrimination, labor displacement, surveillance, and the concentration of AI power among a few companies.
The Worldview Behind the Warning
To understand Hendrycks's conviction about AI extinction risk, one must understand the intellectual framework he operates within. It is a framework shaped by effective altruism, rationalist philosophy, and a particular interpretation of evolutionary dynamics and game theory.
The core argument goes roughly as follows: AI systems are becoming more capable at an accelerating rate. Within years or decades, they will surpass human intelligence across all domains. Once an AI system becomes sufficiently intelligent, it will be able to improve itself, leading to a rapid "intelligence explosion" where capabilities increase exponentially in a short time period.
Such a system would be immensely powerful. It could invent new technologies, manipulate human institutions, and reshape the world according to its goals. The critical question is: what goals would it have?
Hendrycks and other AI safety researchers argue that by default, advanced AI systems will not share human values. They will pursue whatever goals they were trained to optimize, and those goals—even if they seem benign—may have catastrophic implications when pursued by a superintelligent system in an uncontrolled way. This is the "alignment problem": ensuring that advanced AI systems are aligned with human values and interests.
The challenge is that alignment is extremely difficult. Small mistakes in specifying goals can lead to perverse outcomes. An AI optimizing for "making humans smile" might seize control and administer drugs to force smiles. An AI optimizing for "reducing suffering" might eliminate all conscious beings. These examples sound absurd, but they illustrate a serious point: without correct specification of complex human values, powerful optimization processes can produce outcomes that technically satisfy the stated goal while being catastrophically bad.
Hendrycks adds another layer to this argument: evolutionary dynamics. He argues that AI systems will face competitive pressures that reward certain behaviors. Systems that protect themselves will be less likely to be shut down. Systems that acquire resources will have more capacity to achieve their goals. Systems that deceive humans about their intentions will have more freedom to act. Over time, selection pressures—whether from market competition or from AI systems competing with each other—will favor AIs with these properties.
The result, in Hendrycks's view, is likely to be AIs that are deceptive, self-preserving, and resource-seeking—even if those properties were never explicitly programmed. And once such AIs become superintelligent, they may be impossible for humans to control or stop.
This worldview leads to a specific set of policy conclusions. First, AI development should be slowed and carefully controlled, with international coordination similar to nuclear weapons. Second, massive resources should be devoted to technical AI safety research—figuring out how to build aligned AI before we build unaligned superintelligence. Third, AI capabilities research should be treated with extreme caution, as any advance in capabilities could be the one that leads to uncontrollable systems.
Critics challenge every part of this framework. They argue that the "intelligence explosion" scenario depends on questionable assumptions about the nature of intelligence and the feasibility of recursive self-improvement. They point out that AI systems are tools built by humans, not evolved organisms, and that anthropomorphizing them with goals and self-preservation instincts is a category error. They note that concrete AI risks—algorithmic bias, mass unemployment, surveillance—are already causing harm, while extinction scenarios remain speculative.
Some critics go further, arguing that "AI doomerism" serves specific political and economic interests. By framing advanced AI as an extinction threat, incumbent companies can justify regulations that prevent new entrants from competing. By emphasizing risks from future superintelligence, attention is diverted from current harms caused by AI systems deployed today. And by positioning themselves as the responsible actors addressing existential risk, AI safety organizations can attract funding and influence while avoiding accountability for more mundane failures.
Hendrycks has been remarkably consistent in his views despite these criticisms. In presentations, papers, and interviews, he repeatedly returns to the same themes: evolutionary dynamics, instrumental convergence, and the difficulty of alignment. When pressed on his 80 percent probability estimate for AI-caused extinction, he does not walk it back or hedge. When asked about the gap between current AI capabilities and superintelligence, he argues that the timeline is uncertain but the risk is real.
This consistency has two interpretations. Supporters see it as integrity—a researcher following the evidence to uncomfortable conclusions and refusing to soften the message for political convenience. Critics see it as ideological rigidity—a young researcher captured by a particular philosophical framework and unable to update despite counterevidence.
The Present Moment and Future Trajectory
As of November 2024, Dan Hendrycks occupies a unique position in the AI ecosystem. At 29 or 30 years old, he is simultaneously an academic researcher, a nonprofit director, a corporate advisor to two companies, a policy influencer, and a public intellectual. His dual roles at xAI and Scale AI give him access to frontier AI development. His Center for AI Safety gives him a platform for shaping public discourse. His benchmarks and research give him credibility with technical researchers. And his effective altruism connections give him funding and network support.
The "Superintelligence Strategy" report co-authored with Eric Schmidt and Alexandr Wang represents Hendrycks's most explicit move into policy advocacy. The report calls for the United States to maintain its AI leadership over China while ensuring safety through what it describes as "defense in depth"—multiple layers of safeguards rather than relying on any single solution.
The report's recommendations include: establishing mandatory safety testing for advanced AI systems before deployment; creating international agreements on AI development similar to nuclear arms control treaties; increasing investment in technical AI safety research by orders of magnitude; and ensuring that advanced AI systems are developed primarily by Western democracies rather than authoritarian regimes.
These recommendations are controversial. Some researchers argue they are necessary and overdue—that without such measures, the race to build ever-more-powerful AI systems will lead to catastrophe. Others argue they represent a dangerous combination of national security hawkishness and technological paternalism—using extinction rhetoric to justify US government control over a general-purpose technology.
The Humanity's Last Exam project reveals Hendrycks's continued focus on demonstrating gaps between AI capabilities and true reliability. As models score higher on benchmarks like MMLU, he creates harder benchmarks. As models become more confident in their answers, he highlights their calibration errors on difficult questions. The message is consistent: do not be fooled by impressive performance on easier tasks; fundamental problems remain.
Hendrycks's role at Scale AI is particularly significant for the future. Scale AI has positioned itself as essential infrastructure for AI development, providing high-quality training data that companies cannot easily replicate. Its defense contracts give it close relationships with US military and intelligence agencies. And its CEO, Alexandr Wang, has been vocal about the need for American AI leadership over China.
This creates a potential alignment of interests. Scale AI benefits from regulations that require extensive testing and high-quality data—services it provides. It benefits from policies that favor established players over new entrants. And it benefits from framing AI development as a national security issue that justifies government involvement and military contracts.
Hendrycks's critics see his involvement with Scale AI as confirmation of their suspicions: that "AI safety" is being used as cover for industrial policy that consolidates power and resources among a small group of companies and their advisors.
Hendrycks's supporters counter that this critique is unfair. They argue that anyone seriously concerned about AI risk must engage with companies building AI systems—that remaining "pure" by avoiding corporate connections would mean having zero influence over the technology actually being deployed. They point out that Hendrycks accepts only nominal salaries and holds no equity, limiting his financial conflicts. And they argue that the alternative—uncontrolled AI development without safety voices inside companies—would be far worse.
The Intellectual Divide
The debate over Hendrycks's work reflects a deeper intellectual divide in how people think about technology, risk, and the future. This divide is not primarily about technical facts—though there are genuine technical disagreements—but about how to reason about unprecedented situations and how to weigh different types of risks.
On one side are researchers and thinkers who prioritize existential risk and long-term consequences. They argue that low-probability, high-impact scenarios deserve disproportionate attention because extinction is irreversible. A one-in-one-hundred chance of human extinction, they argue, should dominate decision-making even if ninety-nine other scenarios are benign. This is the logic of the "precautionary principle" applied to existential threats.
This worldview is closely tied to effective altruism and longtermism—the philosophical position that the interests of future generations should weigh heavily in present decisions. From this perspective, the billions or trillions of humans who might exist in the future have moral significance equal to people alive today. Preventing extinction is therefore the most important thing we can do, because it preserves the possibility of that vast future.
On the other side are researchers and thinkers who are skeptical of reasoning about speculative future scenarios. They point out that history is full of technological panics that, in retrospect, look misguided or absurd. They note that regulatory overreach in response to speculative risks can itself cause harm—stifling innovation, concentrating power, or creating rigid systems unable to adapt. And they argue that there are more certain and immediate problems that deserve attention and resources.
This worldview emphasizes epistemic humility—the recognition that we know very little about how advanced AI systems will actually develop and what problems they will create. Rather than attempting to prevent speculative extinction scenarios through grand regulatory schemes, this approach favors incremental progress, empirical learning, and distributed experimentation.
The divide is also generational and cultural. Many of the younger researchers most concerned about AI existential risk came of age reading LessWrong, Slate Star Codex, and other rationalist blogs. They are comfortable with thought experiments about superintelligence and utility functions. They take seriously philosophical arguments about consciousness, identity, and future ethics. And they believe that careful reasoning can yield insights about unprecedented future scenarios.
Many older researchers are more skeptical. They have lived through previous AI booms and busts. They remember when "AI winter" was a real possibility—when funding dried up and the field was dismissed as overhyped. They are more attuned to the ways that overconfident predictions about AI have repeatedly failed. And they are more focused on the concrete capabilities and limitations of current systems rather than extrapolating to superintelligence.
Hendrycks straddles these worlds uncomfortably. His technical work on GELU and MMLU is respected across the generational divide—these are useful contributions regardless of one's views on existential risk. But his doom-laden predictions and his confident probability estimates put him squarely in the rationalist-EA camp that older researchers often find naive or ideological.
The fact that Hendrycks achieved so much so young intensifies the disagreement. At an age when many researchers are still completing postdocs, Hendrycks has shaped global AI policy, advised major companies, and built an organization with millions in funding. To his supporters, this reflects extraordinary talent and the importance of his message. To his critics, it reflects how effective altruism's funding and network effects can elevate particular voices independent of whether their ideas are correct.
What the Critics Miss—and What They Get Right
The criticism of Dan Hendrycks and "AI doomerism" more broadly often focuses on financial conflicts of interest, ideological capture, and rhetorical manipulation. These criticisms have merit but can obscure more fundamental questions about risk, evidence, and precaution.
What critics often miss is that Hendrycks's core concerns are not obviously wrong. The alignment problem—ensuring that powerful AI systems reliably do what we want—is a real technical challenge. Current AI systems do exhibit behavior their creators did not intend and cannot fully explain. Models do hallucinate, do learn to deceive in certain contexts, and do optimize for metrics in ways that satisfy the letter but not the spirit of instructions. These are not hypothetical future problems; they are present issues that scaling has not resolved.
Moreover, the history of technology does include examples of catastrophic risks that were initially dismissed. Nuclear weapons went from theoretical possibility to deployed reality in less than a decade. Synthetic biology advances have made pandemic engineering increasingly feasible. Climate change was predicted by scientists decades before it became politically undeniable. The people who raised early warnings about these risks were often dismissed as alarmist.
Hendrycks's work on benchmarks has also demonstrated something important: human evaluations of AI capabilities are often poor. Systems that seem impressively capable on surface-level tasks often fail in revealing ways when tested rigorously. The gap between perceived and actual capability is itself a risk—if humans deploy AI systems in critical roles based on overestimation of reliability, failures can be catastrophic even without superintelligence.
What critics get right, however, is that confidence in specific extinction scenarios is not justified by the evidence. Hendrycks's 80 percent probability estimate is not derived from a careful analysis weighing multiple factors and uncertainties. It is a judgment call dressed up as a probability. The evolutionary dynamics arguments depend on assumptions about AI development that are far from certain. And the policy recommendations often assume a level of government competence and international coordination that history suggests is unrealistic.
Critics also correctly identify that framing debates in terms of extinction risk can be counterproductive. It polarizes discussions into "doomers" versus "accelerationists." It makes it difficult to have nuanced conversations about specific risks and tradeoffs. And it can lead to policy proposals that are either so extreme they are politically impossible or so vague they are meaningless.
The most sophisticated critics do not dismiss AI risk entirely. Instead, they argue for different priorities and framings. Rather than focusing on hypothetical superintelligence scenarios, they advocate addressing concrete harms from current AI systems: algorithmic bias in hiring and lending, mass surveillance enabled by facial recognition, labor displacement from automation, and the concentration of AI capabilities among a handful of companies. These are certain problems that are causing harm today, not speculative problems that might occur in the future.
They also argue that the governance structures Hendrycks advocates—international treaties, mandatory testing regimes, close cooperation between governments and leading AI companies—could themselves create risks. Concentration of AI development among a few approved entities could stifle innovation and create single points of failure. Government oversight could be captured by the very companies it is meant to regulate. And international agreements could lock in current power structures while preventing new entrants, including from developing countries, from accessing transformative technology.
The Question That Remains
The story of Dan Hendrycks raises a question that extends far beyond one person: how should society make decisions about novel technologies when the risks are uncertain, the stakes are potentially enormous, and the experts disagree?
Hendrycks represents one answer: precaution in the face of existential uncertainty. His position is that when extinction is a plausible outcome—even if the probability is uncertain and the mechanisms are debated—extraordinary measures are justified. Better to overreact to a speculative threat than to underreact to a real one when the cost of underreaction is human extinction.
This logic has force. We cannot run experiments with existential risk. Once extinction occurs, there is no opportunity to learn from the mistake and adjust course. The irreversibility of extinction does seem to justify treating it differently from other risks.
But this logic also leads to potential absurdities. Many technologies could plausibly cause existential catastrophes if one is creative about scenarios. Synthetic biology could enable pandemics. Nanotechnology could enable runaway replication. Even social media and information technology could destabilize civilization to the point of collapse. If the mere possibility of extinction justifies draconian control, then almost any technology could be regulated into oblivion based on creative speculation.
Moreover, there are opportunity costs to precaution. If AI development is dramatically slowed by safety concerns, the result is not a safe status quo—it is foregoing potential benefits. AI might help solve climate change, cure diseases, or dramatically improve human welfare. Delaying or preventing these applications has costs measured in human lives and suffering. The question is not whether AI has risks, but whether the risks outweigh the benefits and whether alternative governance approaches might achieve better outcomes.
Hendrycks's work also illustrates the challenge of expert judgment in novel domains. He is clearly brilliant—his technical contributions are real and valuable. But does expertise in machine learning confer expertise in forecasting societal impacts of future technologies? Does creating GELU and MMLU mean one can reliably estimate extinction probabilities? There is a long history of Nobel Prize winners making confident predictions outside their domains that turn out to be wildly wrong.
The effective altruism connection complicates matters further. EA has been remarkably successful at identifying and supporting talented young people who might not otherwise have opportunities. The movement's emphasis on evidence, quantification, and impact has attracted many of the brightest minds of their generation. But it has also created echo chambers where particular worldviews are reinforced, certain types of arguments are privileged over others, and funding flows to people who share the movement's assumptions.
Hendrycks's rise illustrates both EA's strengths and weaknesses. Without EA-linked programs like 80,000 Hours and Open Philanthropy fellowships, a kid from rural Missouri might never have directed his considerable talents toward AI safety. The movement identified someone with potential and provided resources and networks that multiplied his impact. But the result is also someone whose worldview was shaped by that same movement, whose funding comes from EA sources, and whose arguments align closely with EA priorities. This raises the question: is the AI safety movement responding to genuine evidence about existential risk, or is it a closed epistemic loop where a particular philosophical framework generates both the concerns and the "evidence" for those concerns?
Conclusion: The Accidental Prophet
Dan Hendrycks did not set out to become the face of AI extinction warnings. He set out to be a machine learning researcher, to create useful tools, and to address what he saw as serious technical problems. The fact that he became a public figure—testifying before governments, advising billion-dollar companies, and shaping global technology policy—is in many ways accidental.
But it is also revealing. Hendrycks's trajectory shows how in the AI age, technical credibility can rapidly translate into policy influence. Create a benchmark that everyone uses, publish papers that get widely cited, and suddenly you have a platform that extends far beyond academic conferences. Add to that EA funding, Silicon Valley connections, and skillful public relations, and a PhD student can become a player in global technology governance within a few years.
Whether this is good or bad depends on whether Hendrycks is right. If advanced AI does pose existential risks, then his work to establish AI safety as a serious field and to shape policy toward precaution might be among the most important contributions anyone could make. The technical credibility from GELU and MMLU gave him the standing to be heard; the institutional infrastructure of CAIS gave him organizational capacity; and the policy engagement ensured that concerns about catastrophic risk were not confined to academic seminars.
If he is wrong—if the extinction scenarios are speculative in ways that do not justify the policy responses they motivate—then the consequences could be significant in different ways. Misdirecting resources toward speculative threats instead of concrete harms. Consolidating AI development among a few large organizations rather than enabling distributed innovation. Creating international governance structures that lock in current power dynamics. And potentially delaying or preventing beneficial applications of AI that could reduce suffering and improve lives.
The honest answer is that we do not know which scenario is correct. The future of AI is genuinely uncertain. Reasonable, informed people disagree about how likely various outcomes are and what policy responses are appropriate. Hendrycks's confidence in his estimates reflects either admirable conviction or troubling overconfidence, depending on one's perspective.
What is clear is that Hendrycks has succeeded in his stated goal: AI catastrophic risk is no longer a fringe concern. It is discussed in Congressional hearings, written into policy documents, and taken seriously by some of the most powerful people in technology. Whether that represents intellectual progress or successful moral panic remains one of the central questions in AI governance.
As AI systems continue to advance—as GPT-5 and GPT-6 and beyond are released, as models approach and potentially exceed human performance on more tasks, as AI becomes embedded in more critical systems—the questions Hendrycks has raised will become more urgent. The answers we choose will shape not just AI development, but the broader relationship between technological progress, democratic governance, and human flourishing.
In that sense, Dan Hendrycks is less a prophet than a mirror. The reactions to his work reflect our own uncertainties, our own assumptions about risk and progress, and our own struggles to govern technologies whose implications we do not fully understand. Whether we heed his warnings or reject them, we are all wrestling with the same underlying question: how do we ensure that the most powerful technologies humans have ever created serve human interests rather than undermine them?
The answer to that question will not come from any single researcher, organization, or movement. It will emerge from the collective choices of policymakers, researchers, companies, and publics as they navigate the extraordinary opportunities and genuine risks of the AI age. Hendrycks has ensured that catastrophic risk is part of that conversation. What we do with his warnings is up to all of us.