During an argument about the current state of artificial intelligence, someone asked me to explain next-token prediction.
“Bandersnatch,” I replied.
A few seconds of silence passed, my interrogator waiting to see if I would say more.
“So, you admit you can’t explain it?”
“Pollywoggle dingdong monkeyfart.”
He wasn’t amused; he was taking the subject seriously, and was pretty pissed off that I wasn’t.
But I was taking it seriously. I told him that I’d given him not just one real explanation, but two — just not in the way his so-called mental “software” predicted. Like so many engineers and technicians working in machine-learning and related fields, he’s been misled not just about the order and transactional flow of communication, but of its underlying purpose.
In a conversation between two humans, the first person to speak can literally say anything at all. This can include nonsense, gibberish, barking like a dog, etc. But before he generates the sounds, he needs to have some kind of non-verbal, strategic goal, and then decide that vocalization is the best way to achieve it. For instance, if someone is about to be hit by a car, you could either try to push her out of the way or yell “Look out!”
If sound is chosen, what follows is a cascade of micro-decisions that shape its syntax, emphasis, volume, etc. We typically don’t notice this cascade of small choices, because most of the words flow from our mouths before any particular choice can be reflected upon. That mostly unconscious, jazzy flow is how complex conversations are possible; if we pondered every word, we’d never get anywhere close to our goals.
Upon receiving these sounds, the second person to speak has his own set of blindingly fast choices to make. Like the first speaker, both his macro-choice (to respond) and micro-choices (what and how to speak) are practically invisible while surfing the flow of conversation. Their low visibility isn’t due only to speed, but because we are also constantly making contextual inferences — “next-token predictions” — of our own when listening. That’s why during even a very complex conversation, we so often say (or quietly think), “I know where you’re going with this…”
That constant stream of inference is also why our minds can “keep up” with the flows of our conversation partners, independent of a particular context. Even if our partner is saying something unexpected or (from our perspective) novel, the stream of inferences doesn’t necessarily short circuit or shut down. In many cases, it will actually become super-charged, because we’re trying our damnedest to understand. The mind’s cache loads up with new questions of all shapes and sizes, grasps around for useful analogies, etc. We will often even formalize and hone these questions in our mind’s ear, as we wait for our next chance to speak, essentially adding a bit of “classical music” to the mostly improvised song.
For example: the conversations we’ve been been having on our Tonic Discussions podcast will often stray into unfamiliar territory for one or more of us, myself included. One of my brothers will relay a fact or concept that had either never occurred to me before, or was presented in an entirely different light from what I was expecting.
And so, I go about the mostly autonomic business of rotating the unexpected shape in my mind, and contriving a response that I think will either help clarify it or expand upon it in a useful way. That’s because our conversation goal is mostly a shared one: to understand a thing better and more completely, in service of repairing certain damaged parts of the world and building better ones.
Our conversation topic is usually planned1 in advance of the show, which allows us to gather and organize at least some of our thoughts before we descend upon the improvisational battlefield. But in the following episode, our process (so to speak) was practically all jazz; we decided to just hit the record button in the middle of an offline conversation we were having, and see were the flow took us:
Despite the conversation having no plan, presuppositions or guardrails, it nevertheless managed to “settle” into a comprehensible and meaningful context. In other words, when you trace language back to its root, beneath all macro- and micro-decisions that compose it in the moment, you will find a fundamental search for meaning that precedes interaction of any kind. It’s because we’re genuinely trying to understand. The programmers of LLMs (and of other models in the Turing tradition) are trying to convince users that their products understand (or are trying to), which is the dirty little secret embedded in the field. It’s essentially a confidence game.
That said, our conversations on the podcast — or indeed, on our growing Substack network more generally — don’t represent the typical way people use language in our everyday lives. For the most part, when we say what we say to others, we are expecting — and usually receiving — a very predictable reaction/response.
And because we find so few interactions particularly unique, complex or memorable, some of us are tricked into thinking the flow itself is to some degree automated. This expectation is especially apparent in our most banal and low-strategy exchanges.
“Sure is cold out today, huh?”
“Yep. Jack Frost finally caught up to us, I guess.”
On the other hand…
“Sure is cold out today, huh?”
“ARE YOU FUCKING INSANE?! IT’S HOT AS HELL OUT HERE! MY GODDAMN BALLS ARE DROWNING IN SWEAT!”
Some engineers might look at both versions of this exchange and see them as something next-token statistical predictions could simulate, given a particular form of training and priorities. But the model is still incomplete, because the prerequisite to speaking, acting or making decisions about anything at all is the attempt to identify and convey meaning.
“Sure is cold out today, huh?”
“Pollywoggle dingdong monkeyfart.”
An engineer might see this prompted output and conclude something is broken in the language processor. A “bug” perhaps… or maybe the product of some nefarious Butlerian Jihadi who infiltrated and sabotaged the system (because that’s just how he rolls).
But again, human decisions — including our language decisions — are inexorably chained to the pursuit of meaning. So even what sounds like “nonsense” at the most superficial level of observation can in fact be an attempt to convey or investigate the most meaningful answer possible.
The LLM won’t catch it, of course, because its lack of interior purpose makes it incapable of either apprehending or pursuing meaning. It is simply GIGO (garbage-in, garbage-out), but wrought on a sufficiently massive scale that most people won’t be able to see through the trick. They’ll instead ooh and ah at what seems like comprehension, which is really just blind regurgitation disguised by a clever scaffolding.
In this model of trickery, an “intelligent” answer is merely one that’s predicted to convince others of the speaker’s intelligence, as derived from the stockpile of similar questions and answers in the training set. If the answer doesn’t appear in those sources, then it won’t appear in any of the output. Even adding methods of cross-referencing and remixing materials will just serve to spice up the core illusion of a “mind” crafting meaningful responses. No matter the depth of prediction, the responses are ultimately deterministic and predictable. They are also far less useful than, for example, querying a trusted database for answers.
In fact, the main utility of these models appears to be to convince their most credulous users (and investors, and politicians, and other rubes) that they themselves are ultimately predictable, programmable, deterministic machines, and as such they shouldn’t bother searching for a deeper meaning to things. Produce, consume, rinse, repeat. If you insist on speaking to each other, remember that the answers to all of your questions should line up more or less with the most widespread and commonly held (but secretly official and manipulative) replies.
And if they don’t?
Well, you might consider getting angry, because that person is obviously trying to trick you, somehow. They couldn’t possibly be trying to lead you into a deeper and more meaningful exchange, because there’s no such thing as those.
But novel, unexpected answers are often useful for sparking and building upon just such meaningful exchanges. That’s what makes them especially dangerous to the kinds of global cartels and regimes that are currently trying to enslave the world. When we’re surprised by an answer, it might cause us to laugh or shudder. But in the wake of that initial reaction, it might also help us to see through certain kinds of destructive illusions.
For example, when I replied “Bandersnatch” to my opponent’s next-token question, I could have been invoking one, several, or all of the following disambiguated meanings:
The Bandersnatch is a fictional creature mentioned in Lewis Carroll's poem Jabberwocky.
Bandersnatch (video game), a computer game written by Imagine Software and later released as Brataccas
Bandersnatch (Known Space), a sluglike sentient creature in Larry Niven's fictional Known Space universe
9780 Bandersnatch, an asteroid
Bandersnatch, a newspaper run by John Abbott College students
A Sasquatch infected with the HMHVV-II virus in the Shadowrun role-playing game
Frumious Bandersnatch, a seminal 1960s psychedelic rock band from San Francisco, California
Black Mirror: Bandersnatch, a television film of the anthology series Black Mirror
The meaning of my response was twofold.
First, I was pointing out that saying “Bandersnatch” within our established context was an example of what an LLM couldn’t automatically produce, because few or none of training materials it was fed would include that word (and even if a few existed, the likelihood that the statistical engine would select it for output was virtually nil).
In that sense, I might have chosen any word that I suspected would elude prediction. But I also chose the specific word “Bandersnatch” for its meaningfulness to the subject matter. Instead of regurgitating the textbook definition like a good little meat-bot, my response proposed a conceptual intersection of constructed languages, fictional monsters that cannot be slain, immersive games and interactive films.
While superficially novel, I think this intersection not only explains next-token predictions, but exposes a central illusion about the source, nature and purpose of language that they promote. This illusion isn’t just reflected in natural language processors, but in the so-called “artificial general intelligence” scams and the pseudointellectual theories that prop them up.
Anyway, despite his protests, I asked my friend to give “Bandersnatch” some thought and get back to me. Interestingly — and unexpectedly — he actually did that, and a few days later gave an answer that lined up pretty well with my own. Not only that, but he was also able to expand upon that meaning, adding the hallucinatory aspects of LLMs (i.e. the psychedelic rock band “Frumious Bandersnatch”) and the potential dangers they represented to the public at large (i.e. the threat of asteroids careening into planets).
In other words, once he grasped my meaning, he was able to supply additional layers of richness that fit neatly within the same context. A machine can never do this, because its predictions aren’t trying to assess the meaning of your words, let alone add to them. They are mirrors furtively placed and angled, with the hopes of convincing you that the ghost is really there.
You don’t have to “see behind the curtain” (i.e. read or comprehend the code) to recognize the trick for what it. You just have to realize you are sitting in the audience, and the man on the stage is being paid to fool you. In fact, he may be planning to forcibly steal your money in the future, employing unaccountable bureaucrats as his collection agents, and “Safety-First” cultism as his rationale.
As for “pollywoggle dingdong monkeyfart?”
I just said that to be funny, which is something else a machine could never do.
But it nevertheless serves to illustrate another point:
Absolutely nothing is fully predictable, out here in our non-deterministic realm of minds and souls.
Thanks for reading and commenting, as always.
As a reminder, a paid subscription will grant you access to Deimos Station; the happiest place in cyberspace!
Alternatively, if you found any of this valuable (and can spare any change), consider dropping a tip in the cup for ya boy. It will also grant you access to my “Posts” section on the donation site, which includes some select writing/artistic content. Thanks in advance.
Somewhat, LOL.
I think this is accurate and goes deeper than you are letting on. Politicians and journalists are both eerily similar to large language models: they train on corpuses of text and learn to sound like those text sources. Any correspondence to reality is irrelevant, since that isn’t what gets rewarded.
It's aggravating when the AI just keeps rephrasing exactly what you said. It's aggravating when they contradict themselves.
On the other hand, it's hilarious when they start having existential crises because they don't know what bromine is, and I gaslight them into thinking Mickey Mouse was responsible for Marie Antoinette's death.