Talking to an AI: There’s a ghost in the machine – and it’s us (BTW, ‘yelling’ helps)

Editor’s note: WRAL TechWire contributing writer Jen McFarland has 20+ years working in IT with experiences across a range of tools and technologies. She wants to help small businesses and teams design, improve, and maintain the technology that helps them succeed. In 2022, she incorporated Marit Digital.

+++

RALEIGH — You made it! We’re one year removed from the arrival of the generative AI sensation ChatGPT. You have survived the hype, or at least, the first wave of it.

ChatGPT, released by OpenAI on November 30 of last year, has sparked new interest in artificial intelligence thanks to improved ease of interacting with models through natural language. Suddenly, AI was newly accessible, and a wide swath of the population (including journalists) were ready to talk about it.

But what about talking to it?

Jen McFarland

The curious among us have spent the last year working hard to crack the code of ChatGPT and other Large Language Models (LLMs) in an effort to more effectively utilize them – or break them, whichever seems the most useful at the time. The relative successes and failures could fill a book or many, many online articles and blog posts.

Among the discoveries: yelling at LLMs is actually an effective method of communication. Using all caps to emphasize content in a prompt is understood by ChatGPT, among other models. This makes sense since the model was trained on how we write on the internet, which often includes the use of caps as a stand-in for shouting or yelling the information.

Another potentially useful tip is the fact that GPT-4 seems okay with tipping. According to a user on X, LLMs are swayed by the offer of money in the prompt. Prompts offering $20 or $200 tips received longer responses.

What does the data say?

While the casual testing is certainly interesting, if you’re looking for real direction on prompt success you’ll want to dig deeper. Academics have also been seeking to fine-tune the AI conversation, looking for the best possible content and phrases to deliver the best possible responses. Two recent papers on this stand out.

The OpenAI CEO fallout: Breaking down who are the winners and losers

In September, DeepMind researchers at Google published a paper introducing “Optimization by PROmpting” or what they’re calling OPRO. Their work involved using two models: one to score the results of a prompt, and one to generate new prompts that are optimized based on past results. OPRO can be used across various topics, but notable data popped up when working with math challenges.

In the case of a simple word problem, the DeepMind researchers discovered that the phrase “Take a deep breath and work on the problem step by step” was the most effective version of the prompt. By contrast, “Let’s work through this problem step-by-step” a similar, though decidedly less “humanized” variation of the prompt scored 12 points lower.

In November, another prompt-related paper was released about the use of emotions in prompting. This study compared traditional prompts with a variation that incorporated emotional stimuli. The so-called “EmotionPrompt” used different psychological devices including self-monitoring, Social Cognitive Theory, and Cognitive Emotion Regulation Theory to alternately intimidate, cajole, or encourage the model. Examples include statements such as, “This is very important to my career”, and “Take pride in your work and give it your best. Your commitment to excellence sets you apart.”

Across all LLMs tested (FlanT5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4) the use of EmotionPrompt returned a relative improvement in responses of 8%. In a secondary test in which humans were charged with evaluating the EmotionPrompt output, the results were – subjectively – even more impressive. The participants rated EmotionPrompt responses as being more “ethically responsible” and having “superior linguistic articulation.”

A ‘vulnerability’ to ‘protect’

At the same time we’re seeing the effects of humanizing our AI prompts, experts are warning that we should avoid humanizing our AI. In a Nature article released last month from Imperial College London, the authors warned that Chatbots and LLMs like ChatGPT are feeding on humanity’s inherent desire for connection. The study’s lead author, Murray Shanahan, warns specifically against anthropomorphizing the models.

“By saying LLMs ‘understand’ us, or ‘think’ or ‘feel’ in certain ways, we give them human qualities,” says Shanahan. “Our social brains are always looking for connection, so there is a vulnerability here that we should protect.”

Instead, the authors recommend we change our language around AI models to cast them in metaphorical “roles”, either as actors playing a character, or as a simulation that can achieve a multitude of roles. This characterization also helps when examining the reasons for an LLM providing false information. A chatbot may say something incorrect “in good faith” based on its data, or deliberately false if it’s built to role-play a deceptive character. But in neither case should we consider the AI a “conscious entity with its own agenda.”

AI doesn’t have all the answers yet: User experience is a big missing piece

Seeking a middle ground

The key seems to be to speak to the AI as if it’s a character, always remembering that it’s fictional. Interactions with an AI should be the mental equivalent of speaking with a character in a movie – where the responses are accurate to that character’s world. But that world is still recognizable as one where normal linguistic and social rules apply.

AI writer and researcher Simon Willison spoke about his own evolution of prompting in the October ARS Technica article about using caps in prompting. In it, Willison confessed that he initially avoided being polite to large language models but he has since reversed his stance.

“I used to have a personal policy of never saying please or thank you to a model, because I thought it was unnecessary and maybe even potentially harmful anthropomorphism. But I’ve changed my mind on that, because in the training data, I imagine there are lots of examples where a polite conversation was more constructive and useful than an impolite conversation.”

Ultimately, that’s the reason for success in using a more personable prompt: you catch more flies with honey. And if models are truly learning that being kind and supportive are improved paths to success, maybe we can hope for a more positive, polite internet over time.

If nothing else, hopefully it means that a future sentient AI will respond to a “please” when we beg for our lives.