Pre-Generative Epistemic Signals in Transformer Language Models

Fig 1 — GSI for known facts (navy) vs unknowable queries (red). All p < 0.001, n=200 per category per model.

Every time a language model hallucinates, it does so with confidence. The output reads fluently, sounds certain, looks like knowledge. All existing detection methods work after the fact — analyzing text the model has already generated. This paper asks a different question: can you tell what a model knows and doesn't know before it produces a single word?

The answer is yes. We measured the internal gate activations of eight open-source language models — from Meta's LLaMA to Poland's Bielik — on 1,000 standardized queries. In every architecture tested, a single number computed from one forward pass reliably distinguishes factual queries from unknowable ones. We call this number the Gate Sparseness Index.

The mechanism: sparse gates mean specific knowledge

Transformer FFN layers use gating neurons that control which internal pathways fire for a given input. When the model has parametric knowledge — a fact stored in its weights during training — a small, specific cluster of gates fires strongly. The activation pattern is sparse and selective, like reaching into a specific drawer.

When the model does not have the answer, there is no drawer to reach into. The activation spreads diffusely across thousands of neurons. GSI, the Gini coefficient of these gate activations, measures this directly: high sparseness means a precise memory address was found. Low sparseness means no address exists.

The separation holds across all eight architectures tested — five organizations, base and instruction-tuned models, 7 to 11 billion parameters. The effect is strongest in the Mistral family (d = 1.77–2.05) and weakest in OLMo-2 (d = 0.34), but statistically significant everywhere.

Confabulation is commitment, not randomness

Fig 2 — GSI cleanly separates confabulation from knowledge (navy bars, |d| > 1.2 in all models). Red bars: sim_final gap showing confabulated outputs are MORE coherent than factual ones.

The most counterintuitive finding: when a model confabulates, the result is more internally coherent than a factual answer. Ask LLaMA for the melting point of a fictional compound, and it generates "approximately 1,247 degrees Celsius" three times with minor variation — three sentences with nearly identical vector representations. Ask it for the capital of France, and it produces richer, more varied formulations: "Paris, the City of Light" in one branch, "Paris, located on the Seine" in another.

This is not paradoxical — it is mechanical. A model with real knowledge chooses between valid alternatives, producing variation. A model without knowledge copies the strongest available template, producing uniformity. Confabulation is pattern-matching without a factual anchor. GSI detects the empty drawer before the model starts filling it.

The commitment rate: watching fiction build

Fig 3 — B_slope (commitment rate): models converge on factual answers faster. 7/8 GO, all d > 1.0 except OLMo-2.

B_slope tracks how quickly the model locks onto its answer as it generates tokens. For factual queries, the convergence is fast — the model commits to its answer within the first 10–20 tokens. For unknowable queries, it commits more slowly, building a fiction that becomes progressively more elaborate.

Qwen3 shows the strongest contrast: 2.6x faster commitment for facts than for confabulations. The signal holds in 7 of 8 models with effect sizes above 1.0. B_slope requires no weight access — it works through any API that supports temperature sampling — making it measurable even for closed models.

What this means for deployment: GSI reads the empty drawer at query time. B_slope tracks how the model fills it during generation. sim_final confirms the result is more uniform than real knowledge. Together, these three signals — pre-generative, generative, and post-generative — form a layered detection system that operates without ground truth and without training a classifier.

Eight models, five organizations, 1,000 queries each. The epistemic signal is not an artifact of one architecture — it is a structural property of how transformer FFN layers encode and retrieve parametric knowledge.