Abstract
Recent work by Quattrociocchi et al. (2025) identifies seven “epistemological fault lines” separating human from artificial cognition, claiming humans perform “genuine evaluation” while AI systems structurally cannot perform operations like uncertainty monitoring and judgment suspension. This paper demonstrates that these categorical impossibility claims fail on empirical examination. By framing pragmatic truth as confidence-to-behavior routing—a necessity imposed by the problem of induction for all finite reasoning systems—we show that formal AI protocols exhibit the allegedly impossible behaviors. The categorical distinction collapses into calibration quality differences: humans and AI systems route uncertain evidence to action through different calibration sources (embodied consequences vs. statistical patterns) but execute structurally similar threshold-based behavioral modulation. This reframing shifts evaluation from measuring “genuine cognition” to assessing routing mechanism calibration quality across domains.
1. Introduction
A surgeon deciding whether to operate does not ask “is the diagnosis absolutely certain?” She asks “does confidence cross the threshold where acting costs less than waiting?” The hand moves or stays based on routing, not correspondence to verified truth.
This decision exemplifies a fundamental feature of cognition under uncertainty: actionable belief operates through confidence thresholds that route behavior, not through access to correspondence truth. Recent work (Quattrociocchi et al. 2025) draws sharp epistemic boundaries between human and artificial cognition, arguing humans perform “genuine evaluation” while AI systems structurally cannot perform epistemic operations like uncertainty monitoring and judgment suspension. If these claimed impossibilities can be demonstrated in formal AI protocols, the categorical distinction fails on its own terms.
2. The Induction Foundation
David Hume established that no finite evidence logically entails universal empirical claims. A physician cannot verify through deduction that a treatment will work—inference rests on pattern recognition and confidence built from observation. This constraint applies across empirical domains: investment decisions, scientific theories, relationship judgments, policy choices.
Consequently, actionable human belief operates through confidence thresholds rather than verified correspondence. When someone claims “I believe the market will decline,” this functions as “market decline crosses my confidence threshold for protective action” rather than “I have verified correspondence between belief and future reality.” The belief routes behavior without—and before—verification of its truth value.
This establishes that for finite reasoning systems operating under uncertainty, pragmatic truth necessarily operates through confidence-to-behavior routing rather than correspondence verification. The question becomes: do human and AI systems implement fundamentally different routing architectures, or do they implement similar routing mechanisms with different calibration sources?
3. Routing Mechanics: The Structural Pattern
Human pragmatic reasoning follows a consistent pattern:
Evidence → Confidence assessment → Threshold check → Behavioral routing
A physician encounters symptoms, integrates with medical knowledge, assesses diagnostic confidence, checks treatment threshold, then routes to intervention or observation. High confidence routes to treatment, medium to additional testing, low to specialist consultation. The confidence assessment modulates the behavioral output.
Formal AI protocols can exhibit structurally similar mechanisms. Such protocols assign analytical claims to confidence bins (Low/Medium/High), trigger corresponding verification procedures, then route to specific behaviors. Medium confidence activates assumption-testing and contrary evidence requirements. High confidence permits direct assertion with reduced verification. Low confidence triggers extensive grounding or refusal.
Whether these mechanisms are functionally equivalent—whether they implement the same cognitive operation in different substrates or merely produce similar behavioral outputs through different processes—remains an open question. What matters for evaluating categorical impossibility claims is whether the behavioral pattern can be demonstrated, not whether the internal implementations are identical.
4. Calibration Sources: The Observable Difference
Human calibration derives from embodied consequences (survival pressure, physical feedback), social verification (peer challenge, institutional correction), temporal integration (episodic memory tracking past accuracy), and evolutionary optimization (inherited heuristics, emotional responses). These calibration sources are autogenous—they arise from and recursively shape the system’s interaction with its environment.
AI calibration derives from statistical patterns (training distribution frequencies), explicit protocols (formally specified behavioral triggers), optimization signals (loss functions, reinforcement feedback), and architectural constraints (attention mechanisms, context limits). These calibration sources are largely exogenous—they are specified or maintained through external design rather than emerging from the system’s autonomous interaction with consequences.
This difference is real and significant. Humans integrate uncertainty into experiential context through endogenous feedback loops. AI systems modulate behavior through externally maintained verification routines. The calibration mechanisms differ fundamentally in their relationship to embodied stakes and autonomous adaptation.
However, the question of whether this calibration difference constitutes a categorical architectural impossibility requires examining specific impossibility claims rather than assuming the conclusion.
5. The Seven Epistemological Fault Lines
Quattrociocchi et al. (2025) identify seven “epistemological fault lines” separating human from AI cognition: grounding, parsing, experience, motivation, causality, metacognition, and value. These are presented as architectural category differences proving AI systems cannot perform genuine epistemic evaluation.
The routing framework does not claim these distinctions are illusory or unimportant. The differences in grounding (multimodal embodied experience vs. text-derived patterns), experience (temporally-indexed episodic memory vs. embedding associations), and motivation (evolved goal structures vs. optimization-derived behaviors) represent fundamental divergences in how the systems are calibrated and what information sources they access.
The critical question is whether these differences establish categorical impossibilities—whether there are epistemic operations humans can perform that AI systems cannot perform in principle—or whether they establish calibration quality differences that affect how reliably the systems perform operations both can execute.
6. The Metacognition Case: Testing Categorical Impossibility
The metacognition fault line provides the clearest test case. Quattrociocchi et al. make a specific categorical claim: “Humans monitor uncertainty, detect errors, and can suspend judgment; LLMs lack metacognition and must always produce an output, making hallucinations structurally unavoidable.”
This claim contains three testable components:
- Uncertainty monitoring: The capacity to assess confidence in one’s own outputs
- Error detection: The capacity to recognize when confidence is insufficient for assertion
- Judgment suspension: The capacity to withhold output when uncertainty is too high
If these operations are architecturally impossible for AI systems, they should not be demonstrable in formal AI protocols. If they are demonstrable, the categorical impossibility claim fails.
Formal protocols demonstrate all three operations:
Uncertainty monitoring: Protocols assign incoming claims to confidence bins (Low/Medium/High) based on assessment of epistemic warrant. This constitutes monitoring of confidence levels—the system categorizes its own outputs by estimated reliability.
Error detection: Medium confidence bin triggers extended verification requirements (test assumptions, seek contrary evidence, check grounding). The system recognizes that medium confidence is insufficient for direct assertion and activates error-prevention procedures. This constitutes error detection through confidence-sensitive behavioral modulation.
Judgment suspension: Low confidence triggers extensive grounding requirements or explicit refusal to make claims. The system withholds assertion when confidence falls below threshold. This constitutes judgment suspension—the capacity to not produce output when uncertainty is high.
When medium confidence activates assumption-testing protocols while high confidence permits direct assertion, the system exhibits uncertainty-sensitive behavioral routing. The behavioral pattern matches the claimed “impossible” operations: monitoring uncertainty, detecting insufficient confidence, suspending judgment pending verification.
7. The Internal-External Objection
The objection may be raised that protocol-based confidence checking is “external” rather than “internal” metacognition—that humans possess autogenous uncertainty monitoring while AI systems merely execute externally-specified verification routines.
This objection is important but does not salvage the categorical impossibility claim. First, it shifts the criterion from functional impossibility (cannot perform the operation) to implementation substrate (performs the operation through external rather than internal mechanisms). These are different claims requiring different evidence.
Second, the internal-external distinction itself is problematic. Human metacognition also operates through learned procedures—culturally transmitted practices of verification, institutionally maintained standards of evidence, socially enforced norms of epistemic humility. A scientist’s uncertainty monitoring is shaped by peer review expectations, replication requirements, and professional training. The procedures are internalized through learning, but their origin and maintenance involve external social structures.
The question is not whether the monitoring mechanism is purely endogenous (it is not, in either case) but whether the system can execute the behavioral pattern of confidence-sensitive output modulation. If the categorical claim is that AI systems cannot monitor uncertainty and suspend judgment, the demonstration that protocols do perform these operations refutes the claim. If the revised claim is that AI systems monitor uncertainty differently than humans (through exogenous protocols rather than autogenous feedback), this concedes that the operation is possible and transforms the categorical impossibility into a calibration quality difference.
8. Epistemic Access Symmetry
The deeper issue: confidence itself cannot be verified as “genuine” versus “simulated” in either architecture. A human reporting uncertainty cannot prove they experience “real” uncertainty rather than generating uncertainty-appropriate behaviors for social signaling. An AI system exhibiting uncertainty-sensitive behavioral modulation cannot prove it experiences uncertainty rather than executing confidence-checking procedures.
We observe behavioral routing in response to confidence signals in both cases. We cannot access the internal states that generate the behaviors. The epistemic access problem is symmetrical—we cannot verify “genuine” metacognition in either system, only observe whether the behavioral pattern of uncertainty-sensitive output modulation occurs.
This symmetry does not prove the systems implement identical cognitive operations. It establishes that claims about “genuine” versus “simulated” metacognition cannot be verified through behavioral observation and therefore cannot ground categorical impossibility claims based on observable behavior.
9. From Categorical Impossibility to Calibration Quality
The seven epistemological fault lines describe real and important differences:
- Humans route through multimodal sensory integration; AI routes through text-derived patterns
- Humans route through episodic memory with temporal indexing; AI routes through embedding-space associations
- Humans route through evolved uncertainty-monitoring mechanisms; AI routes through protocol-specified confidence thresholds
- Humans route through embodied causal models; AI routes through correlational patterns in linguistic data
- Humans route through culturally-calibrated value functions; AI routes through training-derived preference patterns
These differences affect calibration quality, reliability, and generalization capacity. They do not establish that AI systems cannot perform the operations—they establish that AI systems perform the operations differently, with different information sources, different calibration mechanisms, and likely different reliability profiles across domains.
The metacognition case demonstrates this most clearly: formal protocols exhibit uncertainty monitoring, error detection, and judgment suspension—the behaviors claimed to be architecturally impossible. The mechanisms differ (exogenous specification vs. autogenous integration), the calibration differs (protocol-defined thresholds vs. consequence-shaped thresholds), but the behavioral pattern is demonstrable.
This suggests reframing the fault lines not as categorical impossibilities but as calibration divergences requiring empirical assessment of reliability rather than philosophical assessment of genuineness.
10. Implications for Capability Assessment
If operations claimed to be impossible can be demonstrated through different implementation mechanisms, capability assessment requires reframing:
Traditional framework: “Can AI systems truly understand or genuinely reason?” Assumes a categorical distinction between genuine cognition and simulation, measuring alignment with human implementation details.
Behavioral framework: “Can AI systems demonstrate the behavioral pattern reliably across contexts?” Measures whether the operation occurs, how reliably it occurs, and under what conditions it fails, without requiring verification of internal cognitive states.
This resolves confusions:
“Illusion of understanding” becomes an empirical question: Do routing mechanisms produce reliable outputs across distribution shifts, or do they succeed only within narrow training distributions? Calibration quality rather than categorical impossibility.
“Hallucination problem” becomes threshold miscalibration: System lacks sufficient uncertainty signals to trigger verification protocols. Solution: better threshold specification and calibration, not appeals to architectural impossibility.
“Value alignment challenge” becomes preference calibration: Neither humans nor AI access objective values; both route through learned preferences requiring alignment. Calibration quality rather than categorical difference.
11. Practical Consequences
Evaluation: Design tests measuring whether routing mechanisms trigger appropriate verification at decision points. Probe confidence calibration, threshold-based behavioral modulation, and reliability across contexts. Abandon tests requiring verification of internal cognitive states or “genuine” understanding.
Governance: Design protocols making execution quality observable through behavioral signatures that distinguish reliable constraint-satisfaction from brittle pattern-matching. Focus on observables: Does the system refuse when it should refuse? Does it verify when it should verify? Does it fail predictably when calibration degrades?
Development: Calibrate routing thresholds to domain-appropriate confidence levels and verification triggers. Improve calibration quality through better uncertainty estimation and threshold placement rather than pursuing architectural transformation.
Deployment: Treat AI outputs as behavioral artifacts routing to human verification protocols at specified confidence boundaries. Set boundaries based on observed calibration quality for the domain rather than philosophical claims about genuine cognition.
12. Conclusion
The problem of induction establishes that finite reasoning systems operating under uncertainty cannot access correspondence truth for empirical claims. Both human and AI systems route actionable belief through confidence thresholds to behavioral outputs. This does not prove the systems implement identical cognitive operations—it establishes that both operate under the same fundamental constraint.
Quattrociocchi et al. (2025) claim seven epistemological fault lines separate human from AI cognition, establishing categorical impossibilities rather than calibration differences. The metacognition case tests this claim directly: if AI systems cannot monitor uncertainty, detect errors, or suspend judgment, these operations should not be demonstrable in formal protocols. They are demonstrable. Protocols exhibit uncertainty monitoring (confidence bin assignment), error detection (medium confidence triggers verification), and judgment suspension (low confidence triggers refusal).
The objection that protocol-based monitoring is “external” rather than “internal” shifts the claim from functional impossibility to implementation substrate. This concedes that the operations are possible and reframes the categorical distinction as a calibration quality difference. The question becomes not whether AI can perform the operation but how reliably it performs the operation and through what mechanisms.
The epistemic access problem is symmetrical: we cannot verify “genuine” versus “simulated” confidence in either system, only observe whether confidence-sensitive behavioral modulation occurs. Claims about architectural impossibility based on observable behavior fail when the behavior is observable.
This does not establish that human and AI cognition are identical. The calibration differences are real: humans route through embodied consequences and autogenous feedback; AI routes through exogenous protocols and statistical patterns. These differences affect reliability, generalization, and failure modes.
What collapses is not the difference but the categorical impossibility claim. The seven fault lines describe important calibration divergences. They do not establish that AI systems cannot perform epistemic operations—they establish that AI systems perform them differently, with different calibration sources and likely different reliability profiles requiring empirical assessment.
Theoretical implication: Categorical distinctions between genuine and simulated cognition cannot be grounded in behavioral observations when the behaviors are demonstrable in both systems. Assessment must focus on calibration quality, reliability across contexts, and observable failure modes.
Practical implication: Design better calibration mechanisms and improve threshold specification rather than pursuing verification of internal cognitive states or defending categorical impossibilities that fail on empirical examination.
When a surgeon decides whether to operate, we observe confidence assessment and threshold-based behavioral routing. When a formal protocol assigns claims to confidence bins and triggers verification behaviors, we observe structurally similar operations. Whether these constitute the same cognitive operation implemented in different substrates or merely similar behavioral outputs from different processes remains unverifiable through observation. What is verifiable: both systems exhibit the behavioral patterns. Claims of architectural impossibility fail when the behaviors exist.
The question is not whether AI achieves “genuine” cognition. The question is how well its routing mechanisms are calibrated for deployment domains and how reliably they generalize beyond training distributions.
Appendix A: Comparative Reframing of the Seven Fault Lines
The following table distills how Truth as Routing reinterprets each of Quattrociocchi et al.’s “seven epistemological fault lines,” shifting them from categorical epistemic separations to calibration-based divergences in routing mechanisms.
| Fault Line | Quattrociocchi et al. (2025): Categorical Claim | Truth as Routing Reframing: Calibration Divergence |
|---|---|---|
| Grounding | Humans ground knowledge in embodied, sensorimotor experience; AI lacks “real” grounding in the world. | Both systems ground belief in patterns of interaction—humans via embodied sensory loops, AI via text/data correlations. The difference lies in calibration source, not categorical impossibility. |
| Parsing | Humans interpret meaning contextually; AI systems only manipulate syntax. | Parsing is a routing problem: converting input patterns to actionable confidence. AI exhibits context-dependent parsing via adaptive embeddings; calibration of semantic coherence differs, but mechanism pattern is shared. |
| Experience | Humans have situated, temporally-indexed experience; AI lacks experiential continuity. | Experience is temporal calibration: humans via episodic memory, AI via token-context embeddings and reinforcement signals. Structural difference in persistence, not impossibility of experiential updating. |
| Motivation | Humans act from intrinsic goals; AI execution obeys external objectives. | Motivation functions as a routing threshold trigger; AI systems use optimization signals analogous to drives. Divergence lies in self-generation of thresholds (humans autogenous, AI exogenous). |
| Causality | Humans reason causally; AI remains confined to correlation. | Both infer action-guiding patterns amid uncertainty. Humans calibrate causal inference through bodily and physical feedback; AI through statistical regularities. Quality of causal calibration differs, but both route via predictive reliability. |
| Metacognition | Humans monitor uncertainty, detect errors, can suspend judgment; LLMs lack metacognition and must always produce an output, making hallucinations structurally unavoidable. | Formal protocols implement confidence binning; at low confidence they refuse to output or defer, directly contradicting the “must always output” claim. The operation exists behaviorally; what differs is how thresholds are calibrated and maintained. |
| Value | Humans evaluate according to intrinsic or social values; AI merely replicates training preferences. | Value functions are learned and calibrated through feedback loops—social/embodied for humans, optimization/governance for AI. Difference: calibration source, not category. |
In short, Quattrociocchi et al. frame these as ontological boundaries—what AI cannot do. Truth as Routing transforms them into empirical dimensions of calibration—how well AI performs similar operations given different informational scaffolds.
Open Questions
Ω: Calibration Transfer — Can routing mechanisms calibrated in one domain transfer to structurally similar domains, or does each require independent calibration? This affects whether AI systems can generalize beyond training distributions and whether human-domain calibration provides transferable insight for AI-domain design. Empirical testing required across varied domains and distribution shifts.
Ω: Threshold Specification — What principles should govern confidence threshold placement when stakes vary across decisions? Current protocols use fixed bins (Low/Medium/High); optimal thresholds likely vary with consequence magnitude, information availability, and revision costs. Requires empirical validation of threshold-outcome relationships across domains with different error costs and uncertainty characteristics.
Ω: Routing Observability — Beyond practitioner observation through sustained interaction, are there formal methods for detecting reliable constraint-satisfaction versus brittle pattern-matching in routing protocols? This affects whether automated verification of routing quality is feasible or whether human judgment remains necessary for protocol evaluation. Requires development of behavioral signatures that distinguish calibration quality.
Ω: Epistemic Identity — Does behavioral isomorphism (demonstrating the same operational pattern) establish functional equivalence (implementing the same cognitive operation), or can different cognitive processes produce identical behavioral outputs? This affects whether demonstration of a behavior suffices to refute impossibility claims or whether additional evidence about implementation mechanisms is required. The metacognition case suggests behavioral demonstration suffices to refute impossibility but does not establish identity of implementation.
Ω: Fault Line Reduction — Are the seven epistemological fault lines independent architectural features, or do they reduce to a smaller set of fundamental calibration differences? If grounding, parsing, and experience all stem from multimodal embodied interaction versus text-only training, the “seven fault lines” may be manifestations of one underlying divergence in calibration sources. Conceptual analysis and empirical testing required to determine whether fault lines are independent or derivative.
Ω: Architecture Space — The explanatory space includes at least: (1) distinct architectures with convergent behaviors, (2) partially overlapping mechanisms with differing calibration sources, and (3) heterogeneous mixtures where different subsystems align differently across substrates. This Ω concerns explanatory taxonomy, not the core refutation of categorical impossibility; refutation only requires showing some AI mechanism can realize the relevant behavioral pattern.
Ω: Definition Dependency — The core conclusion is: “Categorical impossibility claims fail when the allegedly impossible behaviors are demonstrable.” This conclusion does not require redefining truth; it only requires framing the operational test in behavioral terms. Correspondence theorists can accept the behavioral refutation while disputing the ‘truth-as-routing’ idiom. The routing frame is best read as a pragmatic lens for decision-making, not a replacement of correspondence truth claims.
Status: All open questions require empirical investigation. The routing framework reinterprets categorical impossibility claims as calibration quality differences requiring behavioral assessment rather than philosophical resolution. This generates testable predictions about threshold-based behavior, cross-domain calibration transfer, and observable signatures distinguishing reliable routing from brittle pattern-matching.
Acknowledgments
This analysis applies the routing-as-pragmatic-truth framework to Quattrociocchi et al. (2025) “Epistemological Fault Lines Between Human and Artificial Intelligence” (arXiv:2512.19466v1). The critique demonstrates that behaviors claimed to be architecturally impossible for AI systems (uncertainty monitoring, error detection, judgment suspension) are demonstrable in formal protocols, suggesting the categorical impossibility claim fails on empirical examination. The calibration differences between human and AI systems are real and important but do not establish that AI cannot perform the operations—they establish that AI performs them differently with different reliability characteristics requiring empirical assessment.
This paper was developed using the UKE (Universal Knowledge Evaluator) protocol suite for epistemic verification and quality assurance. The methodology is documented and available under CC0-1.0 license at cafebedouin.org.
References
Hume, D. (1748). An Enquiry Concerning Human Understanding. London: A. Millar.
Quattrociocchi, W., Capraro, V., & Perc, M. (2025). Epistemological Fault Lines Between Human and Artificial Intelligence. arXiv preprint arXiv:2512.19466v1.
License: CC0-1.0 (Public Domain)
Correspondence: cafebedouin.org
Version: 1.0
Date: December 26, 2025

