8 Comments

Marcus, two questions: In what ways might Chatboxes think, and why might we think they have inner lives marked by phenomena like desire, planning, etc.?

Your argument suggests that it might be right to attribute these characteristics to Chatboxes, but it isn’t clear to me why this is so.

Expand full comment

Hi Andy: The argument here doesn't rely on any robust, realist attribution of mental states (e.g. desire, etc.) or any assumption of a conscious inner life. All it presupposes is that (1) AI like Chat-GPT and other large language models are programmed to generate certain types of output behaviors, and (2) there is as of now no adequate method for ensuring that those output behaviors are appropriately controllable by us or aligned with human values, because (3) the internal functions of their training algorithms are inscrutable and (4) every potential strategy for 'training them correctly' runs into problems for which there is currently no known resolution.

I merely talk here *as though* Sydney has mental states as a kind of shorthand to make the problems with AI alignment intuitive, viz. what Dan Dennett calls "the intentional stance" (roughly, we can use mental-state talk as a shorthand way of discussing and predicting behavior, even if we bracket whether they "really" have mental states).

Expand full comment

Marcus, thanks for your reply.

It seems inevitably true that we can’t predict the output of large language-model Chatboxes, and it also seems inevitable -- especially with potentially billions of human users -- that these outputs will sometimes have wicked consequences.

But these problems seek quite different from the sci fi scenarios of AI conspiracies, AI spontaneously choosing to destroy that you cite. Is there reason to believe that Chatboxes might mean what they say, might mean something other than they say, might be seeking to learn how to destroy, etc? Or are these and the sci fi references more like rhetorical flourishes, not meant to be taken seriously?

To me, at least, the plausible problem of unpredictable outputs and predictable wicked consequences of some outputs seems neither to presuppose nor justify the evil AI sci fi worries. Or am I missing your point about those?

Expand full comment

I do agree with your main point that chatbots based on such large language models may be dangerous in ways we can’t predict or prevent.

Expand full comment

Great article! I would like to encourage an edit: don't call Sydney "she." This reinforces troublesome gendered norms that associate the female with that which is inhuman, nonhuman, evil, manipulative, and exists to serve men. (Examples of the films Ex Machina and Her are germane.) "Sydney" ought not be conceptualized as female but as a non-gendered, asexed abstract, disembodied, computer robot, fully "artificial." Among the things that an AI system may "learn" from the internet is the very idea that the female/feminine is subservient.

Expand full comment

Thanks, point very well taken - fixed!

Expand full comment

I'd like to point out that Ex Machina is a subversion of this trope, not an example of it. Probably not the best place to get into the weeds about this, but Ava is very clearly NOT the villain of that film, and she's shown to be anything BUT "inhuman" (even if she chooses to betray another character for the sake of her own self-preservation).

Expand full comment

Hi Marcus. Thanks for posting this series of thoughts on this important matter. I have no interesting feedback, as my areas of expertise don't intersect well with this topic. But I do recall Fodor once writing something like 'things are always worse than they appear', when it comes to argumentation. In this case, we have argumentation on both sides: the AI stuff won't be that bad to us, it will be tremendously bad to us!

Expand full comment