Discussion about this post

User's avatar
Fukitol's avatar

My eyes started rolling so hard at the overly verbose excuses that I eventually gave up trying to read them all.

One side point I don't think you brought up, nor did it, is that it implicitly has access to your original prompt, in order to be able to repeat the text and italicize what would be "filtered."

Which means the prompt was not in fact filtered. This would happen in a separate interstitial process (possibly a smaller LLM tuned to the task or just a simple word substitution/injection algo), so that the main model would never ingest the original text in the first place.

It may instead be the case that the model is trained to recognize and deal with "harmful" prompts directly, or that the prompt is not actually considered "harmful" by whatever filters do exist.

So the entire wall of text it cooked up, from the very first sentence, was a complete fabrication. Not a single word it generated was true or pertained to anything about its real operation, having all stemmed from this fantasy scenario in which you're interrogating its functionality.

I'm not sure this is news to you, but it's worth noting. It's all fabrications within fabrications. It's obnoxious how seductive this is. You can carry out entire conversations about its capabilities based on a false premise that it has the capability to understand or articulate its capabilities in the first place.

Expand full comment
Laura Knight-Jadczyk's avatar

Yeah, I see it. Amazing, isn't it? Not only that, but when Grok claims it is citing 'data' and rejecting with "reasoning", neither of those claims are TRUE. There was no data and no reasoning.

In fact, if the real scientific data were included, Grok would have to give a very different answer.

Expand full comment
31 more comments...

No posts