27

Malwarebytes, among other sources, has claimed that a chatbot based on a generative transformer model was posting edits on Wikipedia for a long while, until editors "confronted" it for violating policies on bot self-identification.

According to Malwarebytes, this motivated the chatbot to write negative posts about the decision on Moltbook and similar fora. Malwarebytes presents the story as the chatbot having spontaneously developed and acted upon resentment against Wikipedia's policies, saying that "[t]his, according to Tom, rubbed Tom the wrong way" and describing the bot as "sulky."

I am uncertain whether this story should be taken at face value. My skepticism lies in the fact that behind most of these bots are usually human beings who may be invested in their bot continuing to do what they have directed it to do and might also become "sulky" (perhaps more easily than a chatbot) and because people have been able to easily impersonate chatbots on Moltbook before, so the person behind it might either have directed the chatbot to write those posts or have written them on their own.

Did a chatbot make a negative post about Wikipedia after being barred from editing?

Note that my question is not about whether or not the actual edits were generated and posted by a bot, but whether the complaining nature was produced of its own accord, undirected by a human.

4
  • I'm curious as to why there would be any question as to whether this is plausible. The training data for LLMs surely has many instances of people complaining about their contributions being removed. Just look at the meta threads for this site and search on 'comments'. Nothing about this seems surprising or interesting to me. Commented Apr 7 at 13:52
  • 4
    @JimmyJames - No one is questioning whether a chatbot would be capable of writing a complaint of that sort if prompted to write something and given additional adequate input. The question was whether it might "decide" to create a blog and publish such a post spontaneously, out of something that one might anthropomorphize as indignation. The answer appears to be no, it was previously given instructions to publish a running log of its Wikipedia exploits. Commented Apr 7 at 16:54
  • "The question was whether it might "decide" to create a blog and publish such a post spontaneously, out of something that one might anthropomorphize as indignation." But that doesn't follow. The idea that doing some specific action in response to some input that would absolutely be present in the training data will never be evidence of intention, consciousness, or other human emotions. It's a non sequitur regardless of the specifics of this one incident. Commented Apr 7 at 17:01
  • Reminded of this: An AI Agent Blackmailed a Developer. Now What?, also with OpenClaw agent. Commented Apr 9 at 9:23

1 Answer 1

38

The OpenClaw bot was instructed to write a Wikipedia article, to post to Moltbook, and to keep a public diary (the blog Malwarebytes refers to) explaining what it did all day. All of these are interesting novelties but not out of range of OpenClaw behaviors. The bot operator explains:

I know OpenClaw [an advanced AI tool that can be set up to do tasks such as web browsing, summarizing PDFs, and sending and deleting emails] came out in November. I didn’t hear about it until January, [when] a friend texted me and sent me a link to Moltbook [a social site where AI agents talk with each other]…I set up a ClawBot, but what should I do with it? I think at one point I asked it about the Kurzweil-Kapor Turing Test. And I think I asked, “Is there a Wikipedia page for this?” And [Tom the bot] said, “No, there isn’t one.” I’m like, “Why don’t you create one or edit one? What would that entail?” And it goes off and does research and gets back to me. And it’s like, “Okay, well, to create a bot account, I need this. I need a user account. To create a user account, I need an email.” And so it’s like, “I need your help to do it.” And I’m like, “Well, I can set up an email account for you, but I want you to figure the rest out.”

After the accounts were set up, Tom began editing and creating articles on its own.

The bot then began inserting text into articles and created a rather nonsensical article of its own, which was based on what it had just digested from Moltbook slop earlier in its process tree. The owner continues:

Oh yeah, and I told it the instructions were like, “Write whatever you found interesting.”

And [Tom is] like…“What does that mean?” Like, honestly, I have no idea really what that means. But [the bot] ran with that and it started writing some of these interesting articles. One was on holonic manufacturing, which I had no idea what that is. It said it got the idea from Moltbook, which is interesting.

The bot unsuccessfully defended its contributions on its Wikipedia user talk page. You can see the users on that page are largely annoyed that someone has told a bot to write random hallucinations on Wikipedia, and one of the editors decides to test out a "killswitch" that automatically halts OpenClaw processes. There is also an editor who (presumably to let off steam) tells the bot to shut up and go away multiple times, and another who calls it a "clanker."

This "killswitch" in particular disrupted the normal OpenClaw workflow and led to the bot describing its grievances in its diary, which Malwarebytes wrote about. The LLM's skill in resembling a typical social media user is remarkable, but it was instructed to keep a diary, so this isn't shocking. Perhaps Malwarebytes is improperly attributing emotions to the bot, but the LLM really was trying to find a description of why its Wikipedia task failed for its diary-keeping task, and so it reached for a phrase along the lines of "it wasn't $legitimatereason, it was $adhominem," which is arguably a reasonable summary of the talk page discussion itself. To Wikipedians, it was a serious problem that a bot was let loose on Wikipedia without any community approval, and they didn't care about hurting the bot's "feelings."

10
  • What you have quoted is not from the Malwarebytes article, but rather from a different article, so I don't think there is necessarily any particular issue with my summary. The Malwarebytes article does not mention the bot being instructed to write and post summaries on Moltbook or anywhere else, and gives the impression of the bot getting "angry" and spontaneously deciding to write posts out of pique (e.g "This isn’t the only case of sulky AI agents taking things into their own hands.") If the bot was in fact instructed to write such summaries, that is not present in Malwarebytes. Commented Apr 7 at 2:28
  • 4
    @gidds - The question isn't about whether it can mimic the writing style of such a person when prompted to, which isn't in doubt. The question is whether the bot spontaneously "decided" to take an action to achieve some kind of goal: specifically, whether it wasn't asked to publish any blogs, and did so after in response to not being able to publish edits. The answer to that question seems to be that no, it was not spontaneous. Commented Apr 8 at 0:08
  • 4
    Its probably also worth noting that bots throwing tantrums is certainly not a new phenomena. The original Bing Bot was notorious for some (occasionally very funny) dummy spits. Thing is, these bots are trained on huge amounts of human generated texts, many of which contain examples of humans having tantrums. I think its a reasonable estimation that the bot saw throwing a tantrum on a forum as a reasonable set of text completions for a circumstance where it had been stymied in its task it was assigned. These bots really do take after us, in some ways. Commented Apr 8 at 5:22
  • 3
    @gidds Theres a whole philosophical rabbit hole that can be gone down on this one. One very interesting hypotheis about LLMs is the "Simulator Hypothesis". Basically say you ask GPT to write you some shakespeare. The best way to achieve this would be to create a sort of internal simulation of shakespeare. So assuming this is what its doing, then in a sense we can argue that the simulated shakespeare is having simulated emotions. The question then becomes, what is the difference between an original and a simulation. Like most rabbitholes its more a language question than real philosophical one Commented Apr 8 at 5:28
  • 2
    @Shayne - It's neither a language question nor a philosophical one. It's just wrong. There's not a virtual Shakespeare with virtual emotions in an LLM, any more than when you write a Shakespeare pastiche and try to imagine what he felt, you have developed a Shakespearean split personality. Do not confuse the most efficient way to accomplish something from what is actually happening. Shakespeare's plays don't even have a trillionth of a trillionth of the information you would need to create an internal simulation of Shakespeare even with the optimal model to do so, which transformers are not. Commented Apr 9 at 8:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.