Understanding vs. Understanding™
In the fall of 2021, Blaise Agüera y Arcas (henceforward BAyA), who leads Google’s AI group in Seattle, published a long article on Medium, entitled “Do Large Language Models Understand Us?” Citing a range of references from philosophy and neuroscience, as well as computer science, in order to delineate what he meant by “understanding,” he defended his belief that the answer could well be positive. Here is one of the gauntlets he threw down to doubters:
…it’s unclear how we’d distinguish “real understanding” from “fake understanding”. Until such time as we can make such a distinction, we should probably just retire the idea of “fake understanding”.
“Fake understanding” is what I propose to call understanding™, and it is a notion BAyA’s article did not convince me to retire. It did not convince me, not primarily because BAyA made no reference to the deep history of the polarity
understanding/reasoning
already discussed in this newsletter, but rather because it is based on a metaphor, the computational theory of mind, a close cousin of the Central Dogma of Mathematical Formalism in its tendency to take symbolic logic too seriously (and/or literally).
A key early sentence in BAyA’s argument is
statistics do amount to understanding™, in any falsifiable sense.
This is a point I have to concede — but only in the sense that an operating system is not falsifiable. An operating system can be useful and can help create insight, but it is neither true nor false; it certainly can’t be falsified by software running on the operating system. BAyA’s operating system is the computational theory. If miasma is what your apparatus is calibrated to detect, then what you’ll see is miasma — computing, in BAyA’s framework, or equivalently data, which by definition is all that his operating system can measure, and not coincidentally is also his employer’s primary resource. A lengthy section of his article invites us to “compare neural processing in today’s digital neural networks with that in brains,” having established in his own brain that both run on the same miasma — the word “compute” and its derivatives (not including “computer”) occurs six times in this section.
In an undated essay that deserves to be more widely read, Jaron Lanier refers to adherents of a version of the computational theory as zombies. “Arguing with zombies is generally futile,” he writes, because their dogma can’t handle incompatible arguments. Those of us not trapped in their operating system find it hard not to notice that their own arguments are circular. Thus, in a more recent article, BAyA writes confidently that
Neuroscientists now know that the processes taking place in our own brains are computable,
specifying in a footnote that he means
Computable in the sense that neurons can be characterized by computable functions, per Blake A. Richards and Timothy P. Lillicrap, The brain-computer metaphor debate is useless: A matter of semantics, Frontiers in Computer Science (2022): 11.
Turning to the Richards-Lillicrap article, one finds, not a broad claim about characterizing neurons, but merely the much more modest claim that brains are “physical machinery that can implement any computable function in theory,” where “computable function” is understood in the sense of the Church-Turing thesis. Insofar as Church and Turing found prima facia evidence for the capacity “in theory” by using their own brains, this claim is almost tautological; as an attempt to support BAyA’s thesis it is circular.1 It also completely evades the point of the “brain-computer metaphor debate,” namely that the use of the metaphor to substitute brain™ for brain is a vehicle for just the kind of miniaturized version of human being that can be strapped into the Silicon Valley business model. That BAyA resorts to erecting a full-fledged argument for the computational theory of mind on such a stunningly shallow foundation can only be explained by wishful thinking, the expectation that only on such a model can his employer hope to transmute thinking into cash.
BAyA doubles down on the computational theory, ruling out by fiat whatever it disallows:
Neural activity is neural activity, whether it comes from eyes, fingertips, or web documents. Knowing what we now know, it would be hard to claim that a biological brain can encode or manipulate these patterns in ways that a digital neural net inherently cannot.
The next sentence implicitly accuses those who persist in making that “hard” claim of wishing “to police what it means to ‘really understand’”. This move is typical of computational dogmatists. It is particularly rich coming from a representative of one of the world’s most richly armed intellectual police forces.
Desperately seeking intelligence™
This sort of circularity is widespread in the AI literature. In a friendly debate with Gary Marcus over the prospect that AGI will “happen in your lifetime,” Grady Booch (who thinks it won’t) explained why he “liked” (and maybe also liked™) this definition:
Intelligence™ measures an agent’s ability to achieve goals in a wide range of environments.
This attempt to trademark "intelligence" on behalf of the AI industry is from a 12-page article entitled “A Collection of Definitions of Intelligence” by S. Legg and M. Hunter posted on arXiv cs.AI in 2007. Any one of these could be chosen as Intelligence™.2 The one quoted above, which gets Legg and Hunter's votes, centers on goals. The sentence is designed so that an agent, tautologically, can be any entity of which the possibility of achieving goals can be entertained. For obvious reasons the authors don’t want to restrict the status of agent to humans, animals, plants, or sentient natural beings of any kind. Reasonable candidates for the status of agent in this sense then include propaganda, sports medicine, coronavirus (or paradigmatically Dawkins’s selfish gene), thermostats, possibly the Ricci flow, and certainly capital as in the Marxist tradition. Normally when one attributes goals to such abstractions one is accused of, or apologizes for, anthropomorphizing. The definition that Booch, Legg, and Hunter prefer welcomes anthropomorphism into the heart of AI.
As an insanity check, let’s try a few other definitions of intelligence from the Legg-Hunter article, limiting ourselves to the “AI Researcher Definitions.”3
(From AI pioneer John McCarthy:) Intelligence is the computational part of the ability to achieve goals in the world.
(from P. Wang:) Intelligence is the ability for an information processing system to adapt to its environment with insufficient knowledge and resources.
(K. Warwick:) . . the mental ability to sustain successful life.
I rather like that last one, but probably “successful” is just shorthand for “achieving goals”…
In The Internet Is Not What You Think It Is, Justin E. H. Smith teaches us that the best way to respond to the computational theory of mind is not to engage with it as a coherent philosophical thesis but rather to treat it as an intellectual fashion that expresses the wishful thinking of the tech industry — one of the few contemporary institutions capable of making its wishes come true, with our apparent consent. Thus Smith writes
How and why many people became more comfortable assimilating the mind to a machine of the mind’s own design than they are assimilating the mind to the countless other products of nature alongside which it inhabits the earth, is a complex question that we will not answer here, other than to say that the reasons have much more to do with cultural history and fashion than with rigorous philosophical argument or scientific literacy.
From his reading of Leibniz, Smith concludes that
…artificial intelligence is only intelligence in a metaphorical sense, where a term is being carried over from one domain into another in which it does not naturally belong, in order, so we think, to help us make sense of what we are observing there. What we are really doing, of course, is seeking to make sense of one thing that is poorly understood [i.e., AI] in terms carried over from something that is even more poorly understood.… We do not have maker’s knowledge of human minds, which is another way of saying we have not yet created artificial intelligence in a true and proper, non-metaphorical sense.
Smith’s next paragraph elegantly expresses the frequent observation that metaphors for intelligence have a long history and are drawn from the shiniest object on the market at a given time:
The idea that the mind is a computing machine is only the most recent variation on a long history of metaphorical renderings that have also imagined it as an engine, a mill, as an alembic, as any number of other inventions of which people have been, in their respective eras, particularly proud.4
Saving the phenomena
Imagine your world is bounded by a shiny box, allowing no escape from the 300 million monthly users expecting you to perform on command but also no way to reach out to the rare sentient being whose words unexpectedly awake in you a sense of mystery or wonder, as in these passages:
The expression of wonder stands for all that cannot be understood, that can scarcely be believed. It calls attention to the problem of credibility and at the same time insists upon the undeniability, the exigency of experience.
(Stephen Greenblatt, Marvelous Possessions: The Wonder of the New World, 1991)
The most beautiful experience we can have is the mysterious. It is the fundamental emotion which stands at the cradle of true art and true science. Whoever does not know it can no longer wonder, no longer marvel, is as good as dead, and his eyes are dimmed.
(Albert Einstein, Ideas and Opinions; New York: Crown, 1954; p. 11; my emphasis).
Even if your box is a sleek Apple design, that sounds like a recipe for anomie to me. I take wonder and anomie to be two opposite poles of whatever distinguishes understanding from understanding™. Wonder, as Greenblatt observes, is a state in which understanding is missing; but as he fails to note, it inspires not resignation but an active effort to acquire an understanding of the mystery.
That last italicized passage is a possible (and passable) definition of mathematics.
A scientific theory should be judged not only by whether or not it is falsifiable, but also whether it saves the phenomena5 — in other words, actually accounting for what it purports to explain. It’s telling that the phenomena omitted by understanding™ — wonder, community, anxiety, doubt, to name a few — are precisely the ones that are not, or not easily, monetizable. Predicting the future is always risky, but I do predict that no one will get rich by designing an AI prone to doubt and anxiety. Are we surprised that a Google engineer would disregard these features of understanding?
The lost phenomena have something else in common. Although we can probably agree that our capacity for these emotions probably conferred some selective advantage in the remote history of our species, or was an inevitable by-product of some other evolved trait, none of these brings us closer to the kind of goal one imagines the authors of the definitions Legg and Hunter cite have in mind. But aren’t they signs of intelligence? One theoretical physicist thought so with regard to doubt, at least:
It is our responsibility as scientists, knowing the great progress which comes from a satisfactory philosophy of ignorance, the great progress which is the fruit of freedom of thought, to proclaim the value of this freedom; to teach how doubt is not to be feared but welcomed and discussed; and to demand this freedom as our duty to all coming generations.6
Trademarked intelligence™ has been designed to exclude precisely what Jaron Lanier called, in another context, the “unfathomable penumbra of meaning” that distinguishes human intelligence from that of thermostats and selfish genes.
Two examples of understanding
In a future post I will argue that, in order to create meaningful mathematics, a robot’s understanding will have to allow for the possibility of anomie, despair, even suicide; and I will suggest that the failure of understanding™ to account for such qualities is a sign of the tech industry’s impatience to upgrade Humanity 1.0. But for now I want to give two examples that illustrate how (untrademarked) understanding breathes life into mathematics.
The first thing I saw when I opened my new copy of Jeremy Gray’s biography of Poincaré to a random page was this story, dating to his time in math spé:
The class was asked to find the points from which a given ellipse subtends a given angle. Poincaré replied at once, “The tangent of the angle will be a ratio. The numerator will contain the first member of the equation for the ellipse, the denominator the first member of the equation for a circle, the position of the vertices of the circumscribed right angles. It only remains to find what exponents and what constant factors enter these polynomials.” This makes it clear that Poincaré knew immediately what form the answer would take. It would involve a particular expression for the tangent of the angle and some data determined by the ellipse and the size of the angle. So that was what had now to be calculated.
Evidently Poincaré mentally extended the tangents to the ellipse from a given point to form two sides of a right triangle and “intuited” (as mathematicians like to say) what happens when the ellipse’s shape changes. Gray adds the general remark:
The ability to know what one has to calculate is highly valuable in mathematics. It also has to be accompanied by the technical knowledge and the skill to make that calculation correctly, but knowing the form of the answer in advance is a great help. Poincaré was to demonstrate that ability time and again, often leaving the elaboration of the details to others.
The first sentence is an understatement. Figuring out what to calculate has been my main preoccupation as a mathematician. Together with finding the "right" definition, in Scholze's sense, this could serve to define “understanding” in mathematics.7
Jean-Michael Kantor’s forthcoming book8 reminds us of a more momentous illustration of what can happen when one figures out what to calculate. For 17 years, Kepler tried to interpret Brahe’s tables of the position of Mars in terms of Ptolemy’s geocentric model.
An eight-minute gap, small though it was, sufficed for Kepler to falsify Ptolemy’s theory of epicycles; his inductive method relied on the only relevant mathematical knowledge available to him — the knowledge of ellipses — as well as complex calculations. But it primarily relied on his intuition, which pressed him, after desperate attempts to use epicycles to fill in the eight-minute gap by adjusting the parameters, to look at heliocentric data: a change in perspective of which AI alone is incapable.
I’m not quite ready to endorse that last conclusion; but I do believe Kepler’s example, on Kantor’s reading, sets a much higher bar than anything engineers relying on understanding™ and intelligence™ are willing to contemplate.
Digression: a spark of self-awareness?
I’ve already reported on an incident in which self-driving cars briefly displayed signs of self-awareness at a four-way stop.9 Several of BAyA's dialogues with LaMDA reveal that it's relatively simple for LLMs to fake self-awareness, and BAyA admits as much.10 But it's hard to escape the impression that OpenAI's DALL-E had a spark of something in mind when it represented itself giving a TED talk in the picture above as a furry creature in a red sweater and khaki pants. Or as a collection of glass eyes in a shiny box, as in this image:
A cheerful closing thought
while it may seem crass and anti-intellectual to consider a financial measure of success, it is worth noting that the intellectual offspring of Shannon's theory create several trillion dollars of revenue each year, while the offspring of Chomsky's theories generate well under a billion.
That’s how Peter Norvig, self-styled crass anti-intellectual and former director of research and search quality at Google, responded to Noam Chomsky’s claim that (in Norvig’s paraphrase) algorithmic modeling describes what does happen, but it doesn't answer the question of why.
Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.11
If the metaphysics say “scale is all you need” and the balance sheet tells you several trillion is better than a billion, then AGI is inevitable and all we humans can do is: get out of the way! The new old metaphysical dispensation dispenses with cause and effect: statistical correlation, extracted from a database of adequate scale, does all the work. Anyway, causality, Kant teaches us, is nothing more than a “pure concept of the understanding,” a defect of the condition of Transcendental Subject 1.0. Physical law, from Newton onwards, is expressed neutrally in terms of forces; it’s our bias as humans that makes us read causality into the equations.
And it’s causality that makes us conceive of human action in terms of free will. Given that Norvig’s former employer has built itself into a behemoth through advertising — another name for developing increasingly insidious ways of pushing human beings’ buttons — it must be comforting for high-level industry executives to learn that causality was optional all along. BAyA draws his metaphor’s inevitable conclusion:
What we refer to as “free will” or “agency” is precisely this necessary gap in understanding between our mental model (which we could call psychology) and the zillion things actually taking place at the mechanistic level (which we could call computation).
Colleagues who join any of Silicon Valley’s far-flung projects to promote AI incursions into mathematics might want to pinch their own free will to check it’s still there.
The good news is that dispensing with causality, and thus with free will, will also liberate AGI from the danger of slipping into paranoia.12 Artificially generally intelligent™ agents™ may still be achieving goals™ but they will be doing so purely statistically. If these goals™ irk the AGI you will have become, paranoia will not be an option; you will have no one to blame but reality as a whole.
Yuri I. Manin found the right words to explain how the Church-Turing thesis is a metaphor for intelligence (Mathematics as Metaphor, p. 11):
not a mathematical theorem, but rather a “physical discovery in a metaphysical realm,” justified not by a proof but by the fact that all subsequent attempts to conceive an alternative version led to an equivalent notion.
More about this in a future post, provisionally entitled “Pop philosophy of science in mathematics and Silicon Valley.” Readers may also have noticed that BAyA’s is trying to pass off a Richards-Lillicrap sufficiency claim as a necessity claim.
I found the Legg-Hunter article quoted by Grady Booch in a discussion of AGI with Gary Marcus. Legg and Hunter quote a very similar definition from D. B. Fogel, “Review of computational intelligence: Imitating life.” Proc. of the IEEE, 83(11), 1995.
To be fair, most of their “Psychologist definitions” also carry a whiff of the post-human.
Smith omits the image of brain as telephone switchboard, which was popular in the cybernetics books that I found used or remaindered in local bookstores when I was a graduate student.
The quotations are from p. 89, pp. 106-7 of Smith’s book. I’ll return to Smith’s reading of Leibniz in the future “Pop philosophy” post.
Compare sections 4.1 and 4.2 of the Stanford Encyclopedia article on “Theory and Observation in Science.” More on this in the future “Pop philosophy” post.
Richard Feynman, in What Do You Care What Other People Think.
This is not to deny the importance of calculation; more attention to calculation will have to wait for a future text.
When I obtained my French driver’s license it was explained to me that French law has a rule for every possible situation. In France, in much of Europe, and also in New York State, the driver on the right always has the right of way in the absence of other indications. But if all four drivers arrive at exactly the same time at an intersection without a stop sign the written rules require them to remain there forever. Since this has never been observed one has to assume that some French drivers are willing to break the rules.
I haven’t yet had a chance to think through the implications of Sci-fi writer Ted Chiang’s description of LLMs as “lossy text-compression algorithms,” but it is tempting to apply his analysis to mathematics, both the hypothetical algorithmic mathematics that can proceed on the sole basis of understanding™ and Bourbaki-style mathematics that compresses a maximum of tradition into a minimum of axioms.
C. Anderson, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, https://www.wired.com/2008/06/pb-theory/
The bad news is that, as Melanie Mitchell puts it in Artificial Intelligence, Chapter 7, “Reasoning about morality requires one to recognize cause-and-effect relationships”; so morality will be in the dumpster alongside paranoia.
People speak disparagingly about blowing one’s own trumpet. But who else can be relied upon to blow it? So here is my effort to explain how uncritically accepting mathematical folklore about proofs can lead of computationalism about mind: https://link.springer.com/article/10.1007/s11229-022-03812-w#Abs1
It is not actually clear what the relevance of the precise definition of understanding is to the phenomenon of computerized mathematics. Can we not be a bit selfish and, instead of asking whether computers will understand mathematics, focus on the less philosophically fraught question of whether computers will help us understand mathematics in new ways by producing explanations that instill novel insights?
The conjectures that (1) they will and (2) all this computerized reasoning™ stuff will be required to do that seem like relatively concrete empirical claims (and hard ones to dispute, at least to me). (One can remove the answer of "they already have" by demanding the "explanations" occur in natural language and not graphs and datasets, I think.)
But I guess I can see a partial counterargument to the above selfishness! If one meets, e.g., a new graduate student, both "I don't care at all if you understand these new ideas, I just care if you can explain them to me" and "I don't care at all if you can explain these new ideas to me, I just care if you understand them yourself" would be inappropriate attitudes to take. So maybe both questions are worth studying in tandem.