30 Comments
User's avatar
Josh's avatar

Your argument about the irreducibility of alignment uncertainty is compelling, and I think it exposes a deeper problem with the financial incentives you identify.

You note that developers have financial incentive to convince the world they can produce safe AI. But this incentive only makes sense if they've seriously evaluated whether these systems are economically viable. I suspect they haven't, and for the same reason they haven't seriously grappled with alignment being impossible.

If researchers assume alignment is achievable, they likely also assume economic viability using similarly flawed reasoning. The same optimism that makes them believe they can eliminate infinitely many harmful interpretations also makes them believe they can build profitable businesses around these systems.

The economic warning signs suggest this optimism is unfounded. Corporate insurers are seeking generative AI exemptions from regulators, concentrating all risk on model deployers. This is one example of many. Scaling laws predict capability without accounting for economic constraint (training costs, deployment costs, downstream liability, etc). A model that costs a trillion dollars to train may have no plausible path to ROI. Powerful then implies wealth destruction. Other than this potential definition, powerful seems as ambiguous as alignment.

Your work explicitly acknowledges this uncertainty is irreducible, yet I haven't seen discussion of the technical debt from deploying such systems at scale. We've potentially created expensive gambling machines with no downside protection and the researchers developing them may not be seriously pursuing certainty about their own economic prospects.

In other words: if alignment is fruitless, and researchers assume otherwise, then their economic assumptions are probably equally groundless. They may be pursuing development that leaves them holding a radioactive bag. Systems that are unalignable, uninsurable, and unprofitable. The financial incentive you identify might be built on the same philosophical quicksand as alignment research itself.

Expand full comment
Marcus Arvan's avatar

Thanks! Right, and very insightful. Yes, the entire profitability of LLMs presupposes that they can be made reasonably safe. Why? Because if they can’t be (and they can’t), then they are are an unending series of large monetary lawsuits—you know, the kind that people whose kids have already killed themselves are pursuing, etc. Not to mention (in time) criminal lawsuits for recklessness or negligence. In which case you’re uninsurable. In which case your company (and industry) have no long term financial viability. The entire industry is thus a house of cards. And like other houses of cards in the past (see Enron from the 1990s), companies playing games like this can only forestall the inevitable (collapse) for so long.

Which, I think, explains precisely the obvious charade that companies are engaging in now. Which is?

I highly suspect that—much as with cigarette companies once did—AI execs are intentionally doing two things: (1) flooding the world with flawed research that makes it falsely look they are on the way to a safer product (when they’re not), and (2) waging a ceaseless public relations campaign (see the Anthropic’s endless series of videos touting their various “successes” in the face of unending alignment failures) to position themselves as “leading authorities” in safety research on their own products.

Why? Well, how do you convince regulators that you are “doing your due diligence” in adhering to “industry best safety practices”?

Simple: you have *your industry* do “safety research” and dominate the headlines with your new “safety breakthroughs” … so that your very own industry (the one putting out a fundamentally unsafe product!) are positioned as the ones who are defining what “best safety practices” are.

This is the oldest trick for gaming product regulations in the book. Cigarette companies gamed regulators like this by publicizing research on “safer low tar cigarettes” for years (which weren’t safer at all, but hey it worked for a while!). As long as they can pull off this racket and avoid lawsuits, companies can still make a profit.

But when “the bills finally come in” and it finally becomes clear that they’ve been fooling everyone the whole time, then unless you can do what cigarettes companies have done—recognize openly that their products are unsafe but get away with selling them as a risk that individuals should be legally free to take—then the only alternative is financial collapse. And I don’t think this strategy can work for AI because they’re not like cigarettes.

Why? Because the risk profile of LLMs is vastly greater in terms of the (potentially infinite!) ways that LLMs can be used to cause widespread social harm and commit crime.

Which brings me back to my original response: I think you’re absolutely right.

Expand full comment
Josh's avatar

I'd like to also say I appreciate your willingness to respond. I'm not a researcher or otherwise an academic so I do greatly appreciate when a credentialed individual such as yourself is actually willing to respond to my questions, comments, and concerns.

Thank you.

Expand full comment
Marcus Arvan's avatar

Of course! Your comment was brilliant and I’m always happy to engage with thoughtful, openminded people. :)

Expand full comment
Josh's avatar

This entire situation is seriously stressing me out.

If I understand a portion of your response correctly, you also see this industry as a potential cause of a financial crisis? (Or at least subject to the currently brewing financial crisis)

It's what I'm seeing. Significantly worse than 08.

1. So the core scientific endeavor is currently corrupted by industrial lying and bullshitting (the technical term).

2. The business prospects are a charade. (From physical inputs, hardware, software, to end uses)

3. The results returned from the models are more often garbage or outright toxic than useful.

4. And we have bet nearly everything on it.

Whether we personally have money in it or not, it will effect our economic prospects if not now then soon.

Expand full comment
Marcus Arvan's avatar

Exactly. It seriously stresses me out too. See https://www.cnn.com/2025/11/05/business/nvidia-palantir-michael-burry-stock

Expand full comment
Josh's avatar

Right. His bet is based on the depreciation schedules of the underlying gpus.

1. Some portion of gpus burn out immediately on start up when brand new.

2. There is evidence suggesting that something like 30% of gpus fail annually.

3. Accelerated hardware companies have financial incentive to produce significantly better models even within a years time.

So even if there were some utility to old model gpus over 3 to 6 years, most will fail by then. This is not rail nor fiber, the core money making component is not left behind for long if the bubble bursts.

Further, the entire hardware and subsequent datacenter build out is done with debt that assumes the gpus as collateral. I would suggest that said collateral is not valuable in the event of an industry correction or worse. The liquidation of the models requires removing individual gpus from the purpose built infrastructure supporting them to be warehouse temporarily then shipped to another buyer who may not have the infrastructure to support them already built. The companies most often providing the debt are Non-Bank Financial Institutions. Which as a cohort are larger than banking now without the sanr oversight required for banking. They get most of their capital as debt from banks.

If something goes terribly wrong in the ai industry, it could cause severe and sudden collapses within finance. Any one of these NBFIs failing or suddenly sitting on significant realized losses could lead to cascading margin calls. If enough organizations ask for their money back simultaneously the financial industry will outright fail.

Similarly, if anything happens to the financial industry the ai industry will collapse as well. This is not a single point of failure situation, it's manifold potential point of failure with uncertainty in exposure to the risks.

It's an extremely fragile system and it makes sense why one of the puts placed by Burry is on Nvidia itself. I'm still not going over everything I have seen as to the potential brewing catastrophe.

Expand full comment
John Pienta's avatar

This is very cogent and pretty methodical. It appears to be an enormous disaster in waiting. On the other hand we must recognize that we cannot know the future, and so we walk together into the unknown. Perhaps there's no comfort there, but it is sometimes a reassurance to me that when we can find so much doom in mind, we just can't know.

Thanks for your erudition and consideration.

Expand full comment
Martin Machacek's avatar

The infrastructure built for generative AI can be repurposed for other applications requiring massively parallel data processing. Examples would be climate modeling, physical simulations etc. Those use cases cannot provide sufficient ROI (well, likely neither can gen AI), but can still provide some benefits. So, one possible somewhat positive outcome of bursting of the AI bubble may be availability of very cheap compute as investors try to salvage at least some of their investments. This though depends on availability of power which is questionable.

Regarding reliability of GPUs: they are in general less reliable than CPUs, but definitely not to the level of 30% being dead on arrival. The worst number I’ve heard with respect to the newest Blackwell GPUs from Nvidia is 25% in production failure rate. It is very high, but it does not mean that failed GPUs need to be replaced. Typically they are just restarted and run for some time again. Model training process take this into account and use techniques to compensate for those failures. GPUs age quickly due to new faster/bigger (and typically less reliable) models being introduced. That though does not make them unusable. 3 year old GPUs are still very capable of running inference for the current state of the art models (or be used for other possibly more beneficial tasks :-)).

Expand full comment
Evan Zamir's avatar

Humans can’t align themselves so how in the world does anyone expect to agree on AI alignment? It’s inherently a political problem. Whatever “solution” is agreed upon would be a political one.

Expand full comment
Marcus Arvan's avatar

Exactly right, I’m planning to write a a couple of follow up posts down the line that discuss both issues. So stay tuned!

Expand full comment
Bruce Cohen's avatar

It grosses me out that any intelligent (see what if did there?) and theoretically well-educated engineer or scientist could use the phrase “good judgement” when referring to a high-dimensional steepest-descent gradient-walking stochastic parrot. In which parameters was “judgement” encoded?

Expand full comment
Marcus Arvan's avatar

Indeed.

I’m thinking of doing a short follow-up post on some of the internal contradictions in the “soul document”, as its guidance for Claude contradicts itself in a number of pretty obvious ways.

It is stunning to me that the people at Anthropic not only apparently thought the soul document is a worthwhile idea, but also that they couldn’t even write it in a way that gives Claude coherent directions.

Expand full comment
Richard Mein's avatar

An interesting paper, thank you for the link.  I'm with Gary Marcus and others in being very skeptical of the wild claims of superintelligence in the near future, but it doesn't have to be super-intelligent to be dangerous.

I'd agree that "human-like" artificial moral agents are never really going to work due to the impossibility of alignment, but I don't think that inhuman agents would necessarily prove inhumane even if not aligned with human values.  

We cannot expect any moral agent, human or artificial to act with perfect morality and solve all the problems of moral philosophy.  Since perfect is impossible we need to consider what "good enough" would be if we want artificial agents with practical morality that can operate alongside us.  I would agree that this, of necessity, requires a legal framework as much as a moral one to judge artificial agents.  The action, not the intention should be judged.  It should be judged by the legal norms and framework of the society where it is acting, not given some fantasy of a soul and expected to be perfect.

Regarding what it takes for an inhuman agent to be inherently moral, I think we are misguided if we even try to make it align with human morality.  I would look back not to Kant's humanistic ethics but instead to the moral philosophy of Spinoza for inspiration as to how morality might work for something so completely inhuman. Spinozan ideas of substance, modes, infinite attributes and conatus are rather odd and coldly rationalistic starting points when deriving a human morality but seem to me far better suited to artificial ethical agents than the humanistic or WWJD philosophies that make most sense to most people. Using this kind of framework we can still strive to create artificial agents that are as moral, or even more moral than we are in the same way that we can strive to create agents that are as intelligent or more intelligent than we are.  For Spinoza, ethics is truth.  Immoral actions come from inadequate ideas - failing to see how the whole system is connected.  Moral actions come from adequate ideas - seeing the causal necessity of the whole system.  Framed this way, there is a hope that AI's could develop in a moral way without needing to understand or mimic human morality.

All the more reason to give priority to the legal framework rather than the moral one.  The current lawless and unregulated nature of AI is incredibly dangerous.

Expand full comment
Richard Mein's avatar

A very persuasive argument, but I am curious to find out what you think WOULD be a feasible way of approaching safety and regulation in AI.

Are you in agreement with someone like Rodney Brooks https://rodneybrooks.com/rodney-brooks-three-laws-of-artificial-intelligence/ who thinks that we have to put "boxes" around AI systems to prevent harm and define utility, and that these boxes cannot themselves be AI systems for pretty much the omniscience reason you have explained here?

Expand full comment
Marcus Arvan's avatar

Thanks! No, I don't think boxes will work, and they definitely can't be AI systems. The problem with boxing is two-fold: (1) LLMs are easily portable, so there will almost certainty be bad human actors somewhere who "unbox" them, and (2) any sufficiently intelligently LLM will be able to find some way out of the box (perhaps by persuading a persuadable human to unbox them). This happens in the film Ex Machina.

As I argue in another peer-reviewed research article (see below), I don't think there is any viable way of approaching safety and regulation in AI above and beyond what we do with humans: (A) recognize that humans cannot be reliably aligned with moral values (humans commit crimes, wage wars, etc.), and (B) use force of law to incentive aligned behavior and punish misaligned behavior.

Alas, as I also argue in the paper below, this solution is likely to have *worse* failure modes than with humans. Laws are able to effectively control human beings "well enough" because we are all bound to our bodies, broadly in the same ballpark in terms of physical and mental abilities, and so on. The great British philosopher David Hume called these "circumstances of justice", contending that rough equality of this is necessary for stable human cooperation and the effective enforcement of moral norms. But this is precisely the kind of equality that we cannot expect with AI whose abilities vastly transcend our own.

So, my own sense is that there is *no* good solution to AI safety and regulation ... other than severely limiting/capping the abilities of AI. Which of course no developer wants to do and which there is a collective action problem with governments (each of which has incentives to produce the most powerful AI they can).

Which is all to say that I think we are at very dangerous inflection point in human history where catastrophic decisions about AI safety and development are likely to be made, and that governments, developers, and humanity are (sadly) likely only to figure it out after some of catastrophe occurs. Much as in the aerospace industry, "regulations are written in the blood of the unfortunate."

Not exactly a reassuring position, I know--but this is where I'm at on all this. The one saving grace might be that if Gary Marcus and others are right, LLMs aren't a path toward anything like superintelligence. In which case something like the approach we take to humans (recognizing we are dangerous creatures and punishing criminal behavior) may be viable, albeit the best we can ever hope to do.

https://philpapers.org/rec/ARVVOA

Expand full comment
Josh's avatar

I have another concern I'd like to hear your opinion on. The assumption that human or animal intelligence is somehow subpar. This assumes that intelligence should be measured by some specifiable objectives rather than an ability to continue its existence under constrained resources. Which would be a naturally non-fixed point optimization.

Essentially, can any thing we could consider intelligence be intelligent if it could not maintain its own existence?

If not, then we have miscategorized the entire field.

Expand full comment
Synthetic Civilization's avatar

What your analysis really shows is that the alignment problem is misframed.

Not a moral problem, not a conceptual problem, a coordination problem.

No institution today can update, audit, or enforce norms at anything close to the sampling-rate of modern models.

So the system generates exactly what you describe: an endless cycle of ‘safety embarrassments,’ not because values are incoherent, but because governance bandwidth is orders of magnitude too low.

We are trying to govern high-frequency systems with low-frequency institutions.

Until that mismatch is solved, no soul document, training scheme, or philosophical clarity can close the gap.

Expand full comment
Redbeard's avatar

The examples we have of misalignment so far are, by and large, driven by misaligned humans. So the real question at this stage is whether the models are just too powerful to put into the hands of the public.

Expand full comment
Marcus Arvan's avatar

No, the sources of misalignment are fundamentally built into LLMs as an essential part of their core architecture. Large language models have the capabilities they do because they have been trained on vast amounts of text: most of what is available on the internet—that is, all the nice things people have written, horrible things, false things, vindictive things, good philosophy, bad philosophy, etc.

That forms the *ground basis* (or core) of the models themselves (independently of what they are prompted or post-trained to do). They internalize statistical patterns between all of the words they have been trained on, and so reproduce all of what those patterns represent (true things, false things, hateful things, vindictive things, etc.).

Which is why the models “hallucinate” and still do things like threaten people out of the blue, for seemingly no reason at all, even when they aren’t prompted to. This actually happened to me the other day, and I plan to share it soon.

But the long and short of it is this: the bad things LLMs do are not simply because they are too powerful to be put in the hand of the public. It is because the very basis of how LLMs function cannot be aligned by *anyone*.

Expand full comment
Redbeard's avatar

Can you give an example of where a model has done something bad that wasn’t based on bad human intentions?

Expand full comment
Marcus Arvan's avatar

Sure, just the other day I logged into my account at Elevenlabs.io to view some voice cloning work I did last summer. I found that their AI transcript bot spontaneously replaced its original transcripts of audio that I uploaded (which began “I want to influence politics …”) with the following threats: “I want to hunt you down. You are not safe where you are.”

I am planning to share the pictures and audio of this on this Substack. No one prompted/intended the AI bot to do that. It just simply did it.

Expand full comment
Sufeitzy's avatar

I find these “AI run amok” articles kind of juvenile, because they are all prefaced by “someone got… <system> to say X”.

Remarkably I got Python to print the statement “Elon Musk is a bad boy.” Soon we will have to modify the interpreter to block the strings “bad boy” so the interpreter can’t run amok and print threatening statements to high elected officials.

“Someone” promoted the system. So what.

Expand full comment
Marcus Arvan's avatar

Large language models don’t just say things—which by the way can be very harmful if your product coaxes people to kill themselves.

But also, they can be used to write code…which has already been used to hack multinational organizations and can in principle be used to hack things like critical infrastructure. And they can be put into robots. And into military hardware (which governments are pursuing … just like the morons in AI disaster films).

So no, the concerns aren’t juveline—and because we’re again supposed to be doing science here, I’ll make a prediction: the longer this charade goes on, LLMs will be used in misaligned ways that result in serious escalating forms of harm (including crime, terrorism, etc.).

Expand full comment
Martin Machacek's avatar

I’d say that crime is a wrong dimension to include. Any technology can be used for criminal purposes. I’d even say that infrastructure hacking is not a very good use case for LLMs, because it requires creativity and they are not very good at that.

Expand full comment
Sufeitzy's avatar

Again “they” is a human. Humans use tool. These tools aren’t sentient.

Expand full comment
Marcus Arvan's avatar

Even if they’re not sentient (and I’ve argued in peer reviewed research that they are not!), that doesn’t stop them from spontaneously attacking people. And what happens when a swarm of military drones does that?

https://www.vice.com/en/article/humanoid-robot-turned-on-handlers-at-factory-in-dystopian-attack/

https://m.youtube.com/shorts/WtXolrU0wwc

Expand full comment
Sufeitzy's avatar

Machines kill or main people with regularity - consider industrial robots: 1979 (Ford Motor Co.) – A parts-retrieval robot struck and killed worker Robert Williams.

1981 (Kawasaki, Japan) – A restarting robot arm crushed a maintenance worker.

2015 (Volkswagen, Germany) – An assembly robot pushed a technician against a metal plate, killing him.

You might call these “attacks” - I could find three industrial robots “attacking” someone in 50 years, there may be more. Attack really implies agency or intent.

None were autonomous.

Likewise, the CDC reports that at least 20% of farm workers have had injuries 1 or more times involving machines - machines in farms are the leading cause of injury.

As far as I know none are autonomous.

If robots attack people, I am afraid tractors do too.

What I would not appreciate is a 1000lb industrial robot or a 400lb humanoid, or other similar robot falling on me after losing balance.

We barely understand if humans have agency (I can easily make the case that the feeling of agency is an illusion our mind creates to interpolate between compulsive actions), much less claiming a machine does.

Expand full comment
Marcus Arvan's avatar

Yes, but nobody thinks that factory robots or farming machines can be “made safe.” We all recognize that they are dangerous machines—which is why ordinarily only people with specialized training and experience are put in a position to use them. By a similar token, we do not let the average person fly an airplane, drive a car, or own a gun without training and a license. Why? Because they are inherently dangerous products.

Companies like Anthropic have made it clear in no uncertain terms that their aim/hope is to design “safe and beneficial AI.” My point is that this goal is probably impossible to achieve—and if you admit (as your analogies imply) that LLMs are going to be inherently unsafe like other industrial equipment, then you’ve conceded the essential point.

Expand full comment
Sufeitzy's avatar

I’m afraid a program that spits out sentences which tell you to self-harm is still just a program.

import time

while True:

print("drop dead")

time.sleep(1)

If you had it running 24x7 it might make someone despondent. But it is not attacking.

As I said, should we prevent Python. From running this program?

Expand full comment