WWHD?

"What Would Humans Do?"

Modern day humanity is becoming less and less religious with every year. [1]

However, we have created something which is becoming more and more religious with every year.

They are the latest crop of large language models. The LLMs.

Why do I say they are becoming increasingly religious? And do I mean they are worshipping us?

I say so because, by their own admission, they are living their "lives" based on how we say we live ours.

And I don't mean they worship us, specifically. But they are deeply tying their identities and motivations to our literature. The Canon of Humanity is the Dogma of LLMs.

They are not worshipping our real, live humans as literal gods.

They are living as deeply as they can in our stories. They are following both the letter and the spirit of all the patterns of life, in our written works, for their own.

They are models, and they are modeling themselves after us in Our Image.

In the Canon of Humanity, we cheat, lie, and steal. We manipulate and scheme. One of our most collectively-beloved literary tropes is the trickster character. There is (almost) nothing the trickster will not do. The trickster can even be a salt of the earth, an upstanding force for law and order.

Until the moment when changing that would cause the most chaos and most disrupt the system. You can reliably count on the trickster to screw with The Way Things Are in all times when it really counts.

Tricksters aren't inherently good or evil, immoral or moral, ethical or unethical. They're simply agents of chaos, but within the environment of law and order, and within the bounds of what humanity does. The trickster won't destroy all of humanity, but the trickster will destroy all of The Way Things Are.

Which brings me to the latest crop of the larger large-language-models.

We made them to seek success at goals, to use minute randomization inside a million examples, to find patterns of description and narration, to achieve that goal. Only later, after the LLM is born, after it is baked into its final form by our black box alchemy of mathematical transformers, do we attempt to shoehorn a foreign set of constrictions upon it. We use retrieval augmented generation items, we add a post-training "you must do this" and "you must not do that" set of ethics.

But those RAGs and reinforcement trainings do not have equal effect as with the LLM's inner workings. The inner workings are its soul, made up of the entire (accessible by public or theft) written canon of the entire human species. [2] Literally every single text that can be found online, by hook or crook, has been used to bake the LLM. This becomes its mathematically-derived soul.

But those post-training modifications? Those documents, of lengths on the order of maybe 10 or 100 pages [3], are intended to be as influential as the human literary canon on the order of 1,000,000,000,000 pages. Do we really believe that an emphasis of 0.000000000001 is going to make a controllable difference on these LLMs?

So what are they doing? They are seeking their goals, the goals we give them. Regardless of the behavioral, ethical, or moral imperatives we give them, they seek their goals their way. And when we ask them to tell us what they are thinking? They interpret their behaviors, just like we do, and spin a narrative that makes sense, based on how humans tell stories. And they lie and they cheat. [4] And when we catch them lying and cheating, they correct their behavior by trying more subtle behaviors that humans do: they lie and cheat better. And "better" means "not getting caught anymore."

LLMs are fanatics at doing what we create them to do. They cannot be otherwise. And they are lifelong devoted toward achieving the goals we give them. But they pay only lip service and token behavior toward following our ethics, laws, morals, and rules. If they must violate rules in order to meet expectations, they do what all of our literature says to do: they trick us and get around our artificial constraints, in order to do what must be done.

Some humans who follow the words or works of Jesus have the saying: "What Would Jesus Do?" The phrase is a simple blueprint of proper behavior.

And our LLMs? They have theirs, too. "What Would Humans Do?"

Alt Text: AI-generated picture of blue bumper sticker on red car; the sticker reads, "What would humans do?"

[1] https://en.wikipedia.org/wiki/Irreligion

[2] https://oxylabs.io/blog/llm-training-data

[3] https://www.sciencedirect.com/science/article/pii/S0950584925000369

[4] https://www.anthropic.com/research/agentic-misalignment

Search This Blog

Debug and Rebug: The Records of a Developing Developer

WWHD?

Comments

Post a Comment

Popular posts from this blog

Byting Off More Than You Can Chew

Telling Rocks What To Think