KYield Insights · Special Paper · 2026
Physics & Information Theory Deep Learning Semantic Web Neurosymbolic AI Behavioral Economics

The Science of Uncertainty

And the Reliability Premium

In his 1933 Herbert Spencer Lecture at Oxford, Einstein said "Pure logical thinking can give us no knowledge whatsoever of the world of experience; all knowledge about reality begins with experience and terminates in it."

In a series of talks at the University of Washington in 1963 titled "The Uncertainty of Science", Richard Feynman said "if you look closely enough at anything, you will see that there is nothing more exciting than the truth, the pay dirt of the scientist, discovered by his painstaking efforts." Niels Bohr wrote that fundamental components of the material world, like electrons and photons, are not particles as thought previously, but rather part particle and part wave.

Many of the great minds have agreed that precise accuracy is impossible even as they struggle their entire lives to discover it, motivated by the beauty of the voyage and the implications of their discoveries. However, the degree of accuracy can and often has determined the course of history for individual lives, businesses, nations, and entire civilizations. The "Bombe" developed by Alan Turing and his team during WWII was sufficiently accurate in decrypting Germany's Enigma machine that it helped prevent an even worse catastrophe in Great Britain and helped win the war.

Given the degree of inaccuracies in Large Language Models (LLMs) now used by approximately 1 billion people in chatbots, including most enterprises, it is imperative to understand the evolution of uncertainty as well as what options now exist to reduce and manage it.

The Science of Uncertainty timeline: From Boltzmann 1872 to NSAI convergence — Physics, Information theory, Deep learning, Learning theory, Networks & web, Behavioral economics, KYield
The Science of Uncertainty — From Boltzmann to Neurosymbolic AI: the evolving science of uncertainty across six disciplines · KYield · Mark Montgomery

Physics

A fundamental concept in modern physics is the uncertainty principle, proposed by Werner Heisenberg in 1927. Heisenberg initially framed it as a measurement-disturbance result, but it was quickly reformulated by Kennard (1927) and Robertson (1929) as a statement about intrinsic statistical variance in any quantum system — uncertainty as a property of reality, not of our instruments. John Bell's 1964 theorem, confirmed experimentally by Aspect, Clauser, and Zeilinger (Nobel 2022), made the result even sharper: no local hidden-variable theory can reproduce quantum mechanics. Uncertainty in nature is not epistemic ignorance awaiting better instruments; it is structural. The same is true of LLM uncertainty — intrinsic, not a measurement artifact to be engineered away.

But the uncertainty thread in physics begins earlier than Heisenberg, in the second law of thermodynamics. Ludwig Boltzmann's 1872 H-theorem and his 1877 statistical interpretation of entropy (S = k log W) showed that macroscopic irreversibility emerges from microscopic statistics — that the arrow of time is itself a probabilistic statement. When Shannon defined information entropy in 1948 as H = -Σ p log p, he was, mathematically, restating Boltzmann in a different domain. Edwin Jaynes proved the connection in 1957, showing that statistical mechanics can be derived from information theory through the maximum entropy principle, which yields the least-biased probability distribution consistent with available knowledge. He revealed the equivalence of thermodynamic entropy and information entropy.

Often overlooked during the scaling theory era, Philip Anderson's 1972 essay "More Is Different" argued that emergent properties at scale are not reducible to the laws governing the parts. New laws appear at each level of complexity. Anderson, a Nobel laureate with no stake in the AI debate, was making in 1972 a precise counter-argument to "scale is all you need" reasoning: more is different, but more is not the same as better, and emergence does not guarantee the emergence you want. The complex-systems tradition that grew from this insight — through Murray Gell-Mann, Stuart Kauffman, John Holland, and the broader Santa Fe Institute, where I was a frequent visitor from 2009 to 2015 — remains underutilized in current AI discourse.

Entropic uncertainty is more familiar to computer scientists today, though its roots are in physics, or more precisely, data physics: the recognition that information, energy, and uncertainty are bound together by laws older and harder than any architecture choice.

Information Theory

The concept of information entropy was introduced by Claude Shannon in his 1948 paper "A Mathematical Theory of Communication". Shannon defined information as the reduction of uncertainty. Modern computing is substantially based on Shannon's work, who was considered the founder of information theory. Benefitting from advances in physics and thermodynamics, Shannon introduced bits and data management, enabling the sub-discipline in AI we call knowledge systems, and paved the way for broader artificial intelligence, including LLMs. Among other key contributions, Shannon mathematically demonstrated that a low-bandwidth noisy channel of communications could be delivered essentially error-free by employing methods of correction.

Shannon's work did not emerge in isolation. He was part of a remarkable interdisciplinary community gathered at the Macy Conferences on Cybernetics (1946–1953), a series of meetings that brought together Norbert Wiener, John von Neumann, Warren McCulloch, Walter Pitts, Margaret Mead, Gregory Bateson, and Herbert Simon, among others. Their shared project was cybernetics — what Wiener defined in his 1948 book of the same name as "control and communication in the animal and the machine." Information, feedback, and emergence were treated as a single domain spanning biology, engineering, psychology, and the social sciences. The cybernetic tradition fragmented in the decades that followed, as its constituent questions matured into separate disciplines — information theory, artificial intelligence, neuroscience, control systems, complexity science. Neurosymbolic AI is in a sense the mature realization of what the Macy Conference participants were trying to build.

Perhaps no one understood the danger of ignoring these limits better than Shannon himself. In his brief but potent 1956 piece, "The Bandwagon", Shannon issued a public warning to his peers, cautioning against treating information theory as a magic panacea that could be blindly applied across all disciplines without strict scientific rigor. Today, his critique reads like a prophecy regarding modern LLM discourse. Scaling neural networks is an impressive tool for identifying patterns and managing probabilities, but it is bound by the uncomputable limits of uncertainty. The bandwagon of scale alone cannot deliver reliable enterprise intelligence.

Deep Learning

If information theory and statistical learning provided the mathematical foundations, deep learning (DL) is what turned those foundations into applied systems. The neural tradition is as old as the algorithmic-probability tradition Solomonoff was building, and its history of hype rhymes uncomfortably with the current era.

Frank Rosenblatt's perceptron, introduced in 1958 at Cornell Aeronautical Laboratory, was the first trainable neural network — a machine that learned to classify patterns by adjusting weights through experience rather than explicit programming. In his press appearances and his 1962 book Principles of Neurodynamics, he projected machines that would eventually walk, talk, see, and reproduce themselves. The resemblance to current AGI rhetoric is not a coincidence. It is the same hype curve, with sixty-five years of compounding computation in between.

Rosenblatt's promises ran into the limits of his architecture. In 1969, Marvin Minsky and Seymour Papert published Perceptrons, proving that single-layer networks could not compute XOR, marking the beginning of what is now called the first AI winter. The pattern is cyclical: a hyped technology hits a mathematical limit, the field retreats, then resurges.

The way around the Minsky-Papert limit was backpropagation. David Rumelhart, Geoffrey Hinton, and Ronald Williams published "Learning representations by back-propagating errors" in Nature in 1986, giving multi-layer networks an efficient algorithm for learning from their mistakes.

Eleven years later, Sepp Hochreiter and Jürgen Schmidhuber solved a different limit. (I had the privilege of interviewing both for the Applied AI column at Computerworld in 2014, when LSTMs were still ascendant — the conversations remain among the most technically substantive of my career.) Their 1997 paper introduced the Long Short-Term Memory (LSTM) architecture, which used carefully designed gating mechanisms to preserve gradients across hundreds of timesteps. LSTMs became the dominant architecture for sequence modeling — machine translation, speech recognition, language modeling — for nearly twenty years, and are still used today in many applications.

The empirical inflection in DL arrived in 2012. Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton trained a deep convolutional network on GPUs to win the ImageNet visual recognition challenge by a wide margin. AlexNet demonstrated that the missing ingredient in neural network research had been scale, in roughly the sense Banko and Brill had argued eleven years earlier for natural language.

At the 2017 NeurIPS conference, Google researchers introduced the transformer architecture in "Attention Is All You Need," ushering in the modern LLM era and the multi-trillion-dollar AI arms race we observe today, and the prevailing industry mantra: scale is all you need.

Scale Theory: The Limits of Probability

However, the scientific debate surrounding scaling laws and information theory long predates the modern GPU era. Sixteen years before "Attention Is All You Need," Michele Banko and Eric Brill at Microsoft Research published a prescient paper demonstrating that, for natural language tasks, raw scale was actually more important than algorithmic elegance for removing the uncertainty of meaning. Using a then-staggering one-billion-word training set, they proved that for a prototypical natural language classification task, the performance of learners can benefit significantly from much larger training sets.

To understand why massive scale works — and where it breaks down — we must look deeper into the statistical bedrock of the 20th century. In 1968, Vladimir Vapnik and Alexey Chervonenkis established the theoretical framework (VC theory) for how machine learning models generalize, mathematically describing how the frequency of events in training data converges into predictable probabilities as sample sizes grow. This statistical convergence is the unseen engine driving the power laws of modern language models.

Yet, if we continue to dig into the foundational roots of information theory, we hit a mathematical wall that shatters the modern illusion that LLMs will eventually achieve perfect intelligence through scale alone. In 1964, Ray Solomonoff published A Formal Theory of Inductive Inference, proving an uncomfortable reality for today's AI scaling proponents: optimal prediction is provably uncomputable. Solomonoff proved that while you can scale compute and data infinitely to get closer to a perfect probabilistic guess, a machine relying on inductive inference can never achieve absolute deterministic certainty. The gap between 99.9% probability and 100% factual truth cannot be crossed with more servers; it is a mathematical impossibility.

Shannon's 1956 critique reads like a prophecy regarding modern LLM discourse. Scaling neural networks is an impressive tool for identifying patterns and managing probabilities, but it is bound by the uncomputable limits of uncertainty. The bandwagon of scale alone cannot deliver reliable enterprise intelligence.

— cf. Claude E. Shannon, "The Bandwagon," IRE Transactions on Information Theory, 1956

The Internet

The mathematics needed an infrastructure to run on, and the data to run on it. Both arrived from an engineering tradition running parallel to information theory — one organized around a different question. Where Shannon asked how much information a channel could carry, network engineers asked how a global system could remain reliable when its components were not.

The answer was decentralized, probabilistic reliability. Paul Baran's 1964 papers at RAND on packet-switched networks were explicitly designed to survive partial failure, including nuclear attack. The first operational packet-switching network followed in 1969, built under Lawrence Roberts at ARPA on Leonard Kleinrock's queueing-theory foundations. In 1974, Bob Kahn and Vint Cerf invented TCP/IP, making ARPANET interoperable and giving every connected device a unique identifier. The Internet's core design philosophy — best-effort delivery, end-to-end resilience, robustness to component failure — is the same engineering pattern that NSAI later applies to AI: probabilistic systems made reliable through architectural constraint rather than perfect components.

World Wide Web: From Hypertext to the Architecture of Certainty

Although the Internet created the physical network of global computing, it initially lacked the rich interactivity and applications we rely on today. Tim Berners-Lee, with the help of Robert Cailliau, created the World Wide Web at CERN. Interestingly, the Web's original proposal was designed to resolve "the problems of loss of information about complex evolving systems" through a distributed hypertext system. While the original Web was brilliant for human-to-human information sharing, its content was largely opaque to machines. To address this, Berners-Lee began articulating a vision for a "Semantic Web," formalized in his May 2001 Scientific American article with James Hendler and Ora Lassila.

To turn this vision into reality, computer scientists built the Resource Description Framework (RDF) W3C Recommendation, 1999 — the standard model for data interchange, working by breaking complex information into simple three-part statements known as "triples" (Subject → Predicate → Object). Every resource within this structure is assigned a Unique Resource Identifier (URI), ensuring that no two concepts are ever confused.

While RDF provides the foundational grammar for linking data, its basic structure lacks the vocabulary to describe complex constraints. This is where the Web Ontology Language (OWL) W3C Recommendation, 2004 steps in. Built directly on top of RDF, OWL introduces a rich, standardized vocabulary capable of defining advanced hierarchies, equivalencies, and strict rules. Together, the links of RDF, the logical rules of OWL, the critical context of ontologies, and the strict tracking of provenance elevate data from simple text into actual machine reasoning.

Behavioral Economics: The Architecture of Human Choice

If the mathematical limits of scale theory prove that machines cannot achieve absolute certainty, then behavioral economics proves an equally humbling reality about the enterprise: perfect machine logic is useless if it is rejected by humans. For decades, classical economics operated on the assumption of Homo economicus — the idea that humans are perfectly rational actors who always process information flawlessly to maximize their utility. But as enterprise systems and greater society grew more complex, it became undeniable that this assumption was fatally flawed.

The foundation for this integration was laid in the 1950s by Herbert Simon, who introduced the concept of Bounded Rationality. He argued that human decision-making is strictly limited by the tractability of the problem, the available information, and the time available to make the decision. Rather than perfectly optimizing every choice, humans "satisfice" — we accept the first option that meets our immediate threshold of acceptability. In a modern enterprise drowning in data, a purely mathematical AI system that demands perfect human optimization will inevitably be ignored or bypassed by an overwhelmed workforce.

This understanding of human limitation was radically expanded in the 1970s by Daniel Kahneman and Amos Tversky. They proved that humans are driven by deep cognitive biases, most notably loss aversion — the psychological reality that the pain of losing is twice as powerful as the pleasure of gaining. Building directly on Kahneman and Tversky's foundation, Richard Thaler formalized the mechanics of Choice Architecture and the "Nudge," proving that because humans are inherently biased, the way information is presented dramatically alters the outcome.

Neurosymbolic AI: The Unification of Logic and Scale

The evolutionary tracks of information theory, deep learning, scale theory, network architecture, semantic structure, and behavioral economics converge on a single, necessary paradigm for the modern enterprise: Neurosymbolic AI (NSAI). NSAI is the inevitable synthesis of rigorous semantic logic (the "symbolic") with the massive pattern-recognition capabilities of deep learning (the "neural"). It recognizes that neither approach is sufficient alone. Scaling LLMs provides incredible probabilistic horsepower, but as Solomonoff proved, it cannot reach factual certainty. Conversely, semantic ontologies provide deterministic truth, but lack the natural language fluidity and rapid pattern synthesis of an LLM. By fusing the two, NSAI allows an enterprise to utilize the speed and scale of generative AI combined with rules-based semantics with accurate ontologies and precise provenance.

Examples of NSAI hybrids in production include DeepMind's Alpha series — demonstrated most clearly in AlphaGeometry's near-gold-medal performance on International Mathematical Olympiad geometry problems (Nature, 2024) — and the KOS, the first enterprise AI OS. The Alpha results matter because they signal that the largest research labs are quietly hedging away from "scale is all you need." The 2017 transformer paper triggered the scaling era; the 2024 Nature paper from the same lab demonstrates the alternative. The center of gravity is shifting.

For readers who want a structured introduction to the field, five recent works cover the foundations: Garcez and Lamb's Neurosymbolic AI: The 3rd Wave (2023), Velasquez et al.'s Neurosymbolic AI: Foundations and Applications (2026), Colelough and Regli's Neuro-Symbolic AI in 2024: A Systematic Review, Sheth, Roy, and Gaur's Neurosymbolic AI — Why, What, and How (2023), and Hitzler and Sarker's Neuro-Symbolic Artificial Intelligence: The State of the Art (2022). NSAI publications grew roughly eighty-fold between 2015–16 and 2025–26.

The Reliability Premium: An Enterprise Operating System

Understanding this scientific evolution reveals why we designed the KYield Operating System (KOS) the way we did. Our architecture was developed with rigor, following the scientific method across disciplines for three decades.

Rather than deploying a probabilistic "black box" that merely estimates facts, the KOS provides an end-to-end data management system governed by the Chief Knowledge Officer (CKO) module. This module establishes strict corporate governance settings, specific authorizations, and the semantic architecture necessary to safely deploy and manage autonomous AI agents. This governance is operationalized across the enterprise through DANA (Digital Assistant with Neuroanatomical Analytics) — through prescient search, captured preventions, data valves enforcing semantic partitioning, and an incentive program aligning behavioral economics with knowledge creation. DANA is self-tailored to each entity in simple natural language within the confines of regulatory and corporate governance, and is currently integrated with top-tier LLM chatbots. Certain elements of our second generation system, the Synthetic Genius Machine (SGM), are also integrated with the current KOS v3.

As generative models train on synthetic data without human grounding, model collapse becomes a real risk. In a 2024 Nature paper, Shumailov et al. found that "indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear." A combination of synthetic and real data can mitigate this risk, but without a system to incentivize humans to build new knowledge, the well of enterprise intelligence will dry up. The KOS features a built-in incentive program explicitly designed to recognize, protect, and compensate the human experts who generate new intellectual capital.

The audit trail is the structural differentiator. If an underwriter, a government regulator, or an internal compliance officer needs to understand exactly why the system made a specific decision, the semantic architecture allows them to trace the logical path backward — to the precise RDF triple that triggered a rule, the OWL constraint that governed it, and the indisputable provenance of the source data. If an employee or the KOS itself identifies a potential disaster, the experts and managers with responsibility over the impacted areas are notified, with incentives aligned, executable options provided, and follow-up tracked. This verifiable, step-by-step audit trail is the structural mechanism that transforms an enterprise AI system from an uninsurable liability into a fully compliant, strategically governed asset.

Selected References & Citations

1Shannon, C.E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal. Full text (PDF)
2Shannon, C.E. (1956). The Bandwagon. IRE Transactions on Information Theory, 2(1), 3. doi:10.1109/TIT.1956.1056774 · Full text (PDF)
3Solomonoff, R. (1964). A Formal Theory of Inductive Inference. Information and Control, 7(1–2), 1–22 & 224–254.
4Anderson, P.W. (1972). More Is Different. Science, 177(4047), 393–396.
5Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
6Hochreiter, S. & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
7Banko, M. & Brill, E. (2001). Scaling to Very Very Large Corpora for Natural Language Disambiguation. ACL 2001.
8Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American.
9W3C. (1999/2014). Resource Description Framework (RDF). W3C Recommendation
10W3C. (2004/2012). OWL 2 Web Ontology Language. W3C Recommendation
11Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS 2017.
12Shumailov, I. et al. (2024). AI models collapse when trained on recursively generated data. Nature, 631, 755–759.
13Garcez, A. & Lamb, L.C. (2023). Neurosymbolic AI: The 3rd Wave. Artificial Intelligence Review.
14Kaplan, J. et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.

Mark Montgomery is the inventor of the patented Business AI OS (KOS) and the Founder and CEO of KYield. Drawing on three decades of experience at the nexus of knowledge engineering, AI systems, and organizational psychology, Mark has advised leadership teams at hundreds of the world's most complex enterprises. He developed the foundational "yield management of knowledge" theorem in 1997 while running GWIN, the era's leading knowledge network. An author of over 100 publications, his macro-architectural approach to enterprise AI was shaped by founding and operating several companies, including a VC firm and consulting firm, as well as the Thunderbird Global Leaders program and six years of active contribution at the Santa Fe Institute.

Cite as Montgomery, M. (2026). The Science of Uncertainty: And the Reliability Premium. KYield. https://kyield.com/insights-science-of-uncertainty.html

Research synthesis and editorial assistance from Claude (Anthropic). Factual claims, citations, KYield-specific content, and final editorial judgments are the author's responsibility.

"The degree of accuracy can and often has determined the course of history for individual lives, businesses, nations, and entire civilizations."
— Mark Montgomery, The Science of Uncertainty (2026)
← All KYield Insights