Discussion about this post

User's avatar
Josh's avatar

Your argument about the irreducibility of alignment uncertainty is compelling, and I think it exposes a deeper problem with the financial incentives you identify.

You note that developers have financial incentive to convince the world they can produce safe AI. But this incentive only makes sense if they've seriously evaluated whether these systems are economically viable. I suspect they haven't, and for the same reason they haven't seriously grappled with alignment being impossible.

If researchers assume alignment is achievable, they likely also assume economic viability using similarly flawed reasoning. The same optimism that makes them believe they can eliminate infinitely many harmful interpretations also makes them believe they can build profitable businesses around these systems.

The economic warning signs suggest this optimism is unfounded. Corporate insurers are seeking generative AI exemptions from regulators, concentrating all risk on model deployers. This is one example of many. Scaling laws predict capability without accounting for economic constraint (training costs, deployment costs, downstream liability, etc). A model that costs a trillion dollars to train may have no plausible path to ROI. Powerful then implies wealth destruction. Other than this potential definition, powerful seems as ambiguous as alignment.

Your work explicitly acknowledges this uncertainty is irreducible, yet I haven't seen discussion of the technical debt from deploying such systems at scale. We've potentially created expensive gambling machines with no downside protection and the researchers developing them may not be seriously pursuing certainty about their own economic prospects.

In other words: if alignment is fruitless, and researchers assume otherwise, then their economic assumptions are probably equally groundless. They may be pursuing development that leaves them holding a radioactive bag. Systems that are unalignable, uninsurable, and unprofitable. The financial incentive you identify might be built on the same philosophical quicksand as alignment research itself.

Expand full comment
Evan Zamir's avatar

Humans can’t align themselves so how in the world does anyone expect to agree on AI alignment? It’s inherently a political problem. Whatever “solution” is agreed upon would be a political one.

Expand full comment
28 more comments...

No posts

Ready for more?