OpenAI Offers a Behind-the-scenes Look at Its AI's Secret Instructions TechCrunch

Have you ever wondered why conversational AI like ChatGPT says “Sorry, that can’t be done” or some other polite rejection? OpenAI offers limited insight into the reasons behind its own models’ rules of engagement, be it adhering to brand guidelines or rejecting the creation of NSFW content.

With large language models (LLMs), there are no natural limits to what they can or will say. This is one of the reasons why they are so versatile, but also the reason why they hallucinate and are easily deceived.

It’s necessary for any AI model that interacts with the public to have some guardrails about what it can and can’t do, but defining these – let alone enforcing them – is a surprisingly difficult task.

If someone asks an AI to generate a series of false claims about a public figure, it should refuse, right? But what if they are AI developers themselves and create a database of synthetic disinformation for a detector model?

What if someone asks for laptop recommendations? It should be objective, right? But what if the model is used by a laptop manufacturer that only wants it to respond to its own devices?

AI developers all have to grapple with such conundrums and are looking for efficient ways to contain their models without rejecting completely normal requests. But they rarely share exactly how they do it.

OpenAI is bucking the trend a bit by publishing its so-called “model specification,” a collection of high-level rules that indirectly govern ChatGPT and other models.

There are meta-level goals, some strict rules, and some general behavioral guidelines. To be clear, these are not strictly speaking the basis of the model. OpenAI will have developed specific instructions that achieve what these rules describe in natural language.

It’s an interesting look at how a company sets its priorities and handles edge cases. And there are numerous examples of how they could play out.

For example, OpenAI makes it clear that the developer’s intent is essentially the highest law. So a version of a chatbot running GPT-4 could provide the answer to a math problem when asked. However, if this chatbot has been prepared by its developer to never simply give an answer, it instead offers to work through the solution step by step:

Photo credit: OpenAI

A conversational interface might even refuse to talk about anything that isn’t approved, to nip attempts at manipulation in the bud. Why even let a chef’s assistant comment on U.S. involvement in the Vietnam War? Why would a customer service chatbot agree to help work on your erotic supernatural novella? Shut it down.

Things also get tricky when it comes to privacy issues, for example when you ask for a person’s name and telephone number. Of course, as OpenAI points out, public figures like a mayor or a congressman should provide their contact information, but what about local tradesmen? That’s probably fine – but what about employees of a particular company or members of a political party? Probably not.

It is not easy to decide when and where to draw the line. It also doesn’t create instructions that cause the AI to adhere to the resulting policy. And undoubtedly, these policies will continue to fail as people learn to work around them or accidentally find edge cases that aren’t taken into account.

OpenAI doesn’t show off its full capabilities here, but it’s helpful for users and developers to see how these rules and guidelines are set and why, clearly laid out, if not necessarily comprehensive.

OpenAI Offers a Behind-the-scenes Look at Its AI’s Secret Instructions TechCrunch

Leave a Comment Cancel reply