Life as Everybody Knows it. Book Review: ‘Life As No One Knows It’ by Sara Imari Walker
I was looking forward to reading Sara Walker’s book, Life As We Know It: The Physics of Life’s Emergence, given the extensive media coverage — a masterclass of marketing — that turned it into an instant Bestseller. The book title certainly evokes the title of another controversial paper by Karl Friston under the title Life as We Know It perhaps as a way to establish a difference, yet their greatest similarity is that they were both widely advertised advanced as marketing exercises when having little substance now even accepted by Friston who tells his Free Energy Principle more of a principle as something that can be proven or tested.
I was particularly interested in Sara Walker’s ideas on algorithmic probability and open-endedness, a topic on which we briefly collaborated, even co-authoring a paper with Alyssa Adams (a common student today a successful researcher) and with celebrated author (and Walker’s thesis advisor) Paul Davies. I was also invited by her to contribute to her book on Life and Information before we could not find any more common ground in the way we approach science.
However, Walker’s new book surprised me for the wrong reasons. Rather than focusing on her own work, Walker devotes most of the book to discussing the ideas of another person, chemist Leroy (Lee) Cronin, and a hypothesis about life that has been disproven multiple times by multiple groups, including ours. Cronin is another publicity-stunt led author that conducts science as a marketing activity.
Walker introduces Assembly Theory or AT for convenience, a controversial hypothesis that suggests that the only feature that matters to characterise life is its ability to make numerous copies of itself or utilise numerous copies of the resources it requires, copies which can be quantified by the assembly index within this theory, according to the authors. This is, however, life as we already knew it, highly self-assembly processes drive or are the consequence of living systems and this has been one of the hallmarks of life: reproduction (of itself, hence copies). The same author, Lee Cronin, also thinks this same idea around Assembly Theory, solves everything from the expansion of the universe to cosmic inflation, drug discovery, chaos and even quantum randomness.
Sara Walker, however, spends nearly 250 pages explaining this well-established fact about reproduction and of matter for that matter (matter accrues and accumulates forming structures) known for over a century, making the book trivially accurate, but failing to deliver on its promise to explain life ‘as no one knows it’ as Assembly Theory nowhere shows why, and has no better explanation for the how as investigated by the self-assembly community before. Instead, all across the book, the strategy is to present how things introduced centuries or even millennia ago differ from how Assembly Theory defines them today — as if representing modern science.
For example, when it comes to defining ‘objects’, Assembly Theory proposes to define ‘object’ as a finite, distinguishable, breakable, and able to exist ‘more than once’, as opposed to immutable, unbreakable and indistinguishable as (Greek) science would do, conflating the primitive concept of atom and its etymological root with modern science.
According to Walker, this ‘new’ account of what an ‘object’ should be is unique to Assembly Theory and a revolutionary new way to characterise objects capable of unifying biology and physics, just as James Maxwell unified magnetic and electric forces and Albert Einstein unified space and time, placing herself and Cronin at the same level. Indeed, you are not missing anything, there is not more to the theory than the idea that things repeat themselves and accumulate to form structures, Cronin calls his hypothesis very simple but it is empty, with no substance but hot air.
The book gives the impression that Assembly Theory (AT) offers something new in applying this simple idea of counting copies in biology, from selection to evolution. Unfortunately, it seems that either the authors of AT were unaware of or chose to overlook the fact that this idea had already been explored or decades even with the same tools. In 2005, for example, computer scientists Rudi Cilibrasi and Paul M.B. Vitányi reported in the journal IEEE Transactions on Information Theory that by counting copies of nucleotides using a compression algorithm they could accurately reconstruct an evolutionary tree from mammalian mtDNA sequences.
Indeed, there are already several established statistical tools that can and have been used to count copies in biological and chemical data, such as Shannon Entropy and compression algorithms like LZW (for Lempel-Ziv-Welch who introduced it in the 1970s), which underpin formats like ZIP and PNG, designed expressly for the purpose.
That compression or counting copies works does not come as a surprise. In molecular biology it is well established that counting for GC nucleotides, which are molecules, discloses the relationship between species. This is because two species that are evolutionarily related to each other will have about the same number of GC nucleotide copies. This is referred to as GC content. My group has conducted research showing how information theory and algorithmic complexity can be combined with GC content to discover interesting transcription applications to DNA nucleosome research in order to find regions of high genomic content, research we published in the Oxford University journal Nucleic Acids Research.
The first simple landmark result obtained by Cilibrasi and Vitanyi, represented in the above figure, was completely omitted from the book’s background discussion, alongside pretty much every other relevant piece of work necessary to properly introduce the reader to a long-studied field. Instead, the book gives the uninformed reader the impression that the author(s) (Cronin/Walker) originated all these ideas themselves.
Fast forward 15 years and we showed how to reconstruct a human developmental tree (Waddington landscape) of organ cells from stem cells to fully mature differentiated cells using the principles of combining classical information theory, algorithmic complexity and perturbation analysis for causal inference all validated with independent genomic literature (GO, KEGG and EcoCyc):
While Walker’s book does not cite any of this previous work and more, it does not address multiple criticisms of AT, simply sweeping them under the carpet. One might at least expect that the assembly index proposed counts the number of copies in data in a novel fashion, for example, in a manner different from compression algorithms or a simple application of Shannon Entropy. However, it does not.
In a recent paper, we have formally and mathematically proven that the assembly index, which the author(s) of AT consider revolutionary, converges to Shannon Entropy via an LZ grammar. What is an LZ grammar? It refers to a family of compression algorithms, including LZW, which underlies formats like ZIP and PNG. Therefore, these new ideas under the umbrella of a hypothesis called Assembly Theory are equivalent and the same as those followed in the above-mentioned paper, ideas that our group and others have been using for a decade and a half, building upon previous work rather than rehashing concepts and rediscovering the wheel.
In another paper recently published in Nature’s npj Systems Biology journal, we demonstrated that the results touted as unique by the authors of AT can be replicated, and even surpassed, using very simple and widely used statistical tools. This provides evidence that their assembly index offers no advantage over existing methods, especially considering that they converge to the same values, Shannon Entropy.
Moreover, other research groups have shown that AT’s idea of a copy-number threshold is flawed. In fact, geology and planetary researchers, including some from NASA, recently published a paper in the Journal of the Royal Society Interface demonstrating that they were mistaken. The authors of Assembly Theory responded by claiming that their threshold was only a guesstimation for Earth and that any molecular data would need to be filtered and pre- or post-processed. This significantly undermines the claim that papers on AT and Walker’s book have advanced that their tool is an agnostic life detection tool for Earth and beyond (such as aliens on other planets).
Furthermore, our group reported the separation of organic from non-organic compounds years before they did, conducting an experiment over a full database of chemical compounds rather than the approximately 100 compounds they selected. We showed that there was nothing special about such a separation, as it was easily performed using a variety of tools and different representations of chemical data — something noteworthy that we reported and published, without ever claiming to have discovered a Theory of Everything, as the author of this book does. The separation of compounds serves as the ‘empirical evidence’ used to validate Assembly Theory.
We have also shown how algorithmic information theory may explain certain aspects of evolution and selection through formal experiments and biological data, never concealing our sources or appropriating the work of others without proper attribution. We used Shannon Entropy, compression, and tools we designed ourselves, building upon the work of complexity science and evolutionary biology to demonstrate our method’s effectiveness.
Life is highly modular, but we have known this for nearly two centuries, since the discovery of the cell, DNA, the genetic code, and more. From a computational perspective, the earliest questions about how to characterise life were explored by pioneers like Claude Shannon (his thesis was on Genetics), Alan Turing, John von Neumann, and Nils Barricelli — founders of digital computation — mostly related to pattern formation and self-replication (i.e., making copies). This was also a topic explored by authors like John Conway, Chris Langton and Stephen Wolfram, and not only with Turing machines but with a range of tools, from genetic algorithms to P systems to membrane computing, all exploiting hierarchical modularity based on having an abundance of ‘Lego copies’ (e.g. proteins) with which to build things, from genetic sequences to cellular compartments. This is not to mention those from whom the authors of AT have ‘borrowed’ (to be generous) their ideas, such as Andrey Kolmogorov, Gregory Chaitin, Ray Solomonoff, Leonid Levin, and Charles Bennett. Dozens of books and countless articles have been published on connecting information theory and life. Does Walker cite any of this prior literature or any literature that has laid the foundations for what AT aims to do — but cannot deliver? No.
When Walker does reference some of this work, it is misrepresented, for example, the work of Alan Turing. Indeed, the book’s treatment of the topic of computation illustrates a deep misunderstanding of the subjects the author covers. Walker claims, for example, that the concept of Turing universality is based on a peculiar and abstract type of ‘machine,’ which completely misses the point of computational universality. What Alan Turing did was to assume the most basic model of computation to prove that all machines and models are equivalent. Turing machines are invoked only to prove mathematical theorems, but they are by no means fundamental to the phenomena that computation explains.
Turing was interested in capturing the concept of mechanical inference or manual derivation — a strong form of causality that he believed could be executed by humans with a pencil and paper. (Early in the last century, human calculators were called computers, before the advent of digital or electronic computers.) The author confuses the peculiarities of a specific model with the broader picture. The big picture is Turing’s result, which proves that the underlying substrate is irrelevant: every mechanical procedure can be translated into a Turing machine or any physical procedure, and vice versa. Walker is very proud of Assembly Theory’s step-by-step approach, which she believes can provide a definitive measure. However, she is essentially describing a Turing machine — a computer program that she mistakenly thinks is fundamentally different from Turing machines.
When Walker briefly mentions Kolmogorov-Chaitin complexity, referring to it as ‘algorithmic compression’, she fails to distance AT from it. According to her, trying to find computer programs to explain objects like the mathematical constant pi is very difficult. However, when the authors of AT defend their assembly index, they do so by claiming that its calculation is intractable (very difficult to calculate) and that they can only propose heuristics to do so. This is correct, because limiting the space of computer programs to those that produce an object by finding its repetitions or assembly blocks is still a large region impossible to explore in full, but the approach falls into a proper subset of algorithmic complexity itself, making the assembly index an algorithmic compressor. This brings them full circle, back to what they tried to distance themselves from in the first place: algorithmic compression.
The other argument is that the assembly index is worse than algorithms like LZW at compressing because it may take longer to explore the set of shortest assembly paths, that they don’t exhaust anyway, so their defence amounts to saying that they are better because they are worse. Regardless, Walker correctly defines the assembly index as the shortest number of elements that build the object, which is exactly the definition of algorithmic (Kolmogorov) complexity or K, and they approximate it by means of counting number of copies, which is a Shannon Entropy estimation by way of an LZ compression algorithm (an upper bound on K). In other words, almost every page frustratingly advances an argument that logically contradicts another argument on the same page or the pager right before or right after. In this case, while they try to convince the reader that they have nothing to do with compression or algorithmic complexity, everything they advance has to do with it.
We know that many physical processes can produce many copies of the same objects without being alive, such as basaltic rock formations (like the Giant’s Causeway in Northern Ireland), snowflakes, and more, yet they are not related to living systems in any way. The claim that AT can find the separation between life and matter by counting the number of copies and hence gauging abundance in an object is blatantly incorrect. This is why simply making copies, even in the physical world, is not sufficient in any way to define life. This has been understood since the times of Mendel, Turing, Darwin, Schrödinger, and almost any other time in history since the Greeks proposed that atoms formed matter and later physicists and chemists found that all atoms were of a limited number of types and that everything else emerged from them through copying and combination. While copying is a property of self-replication and a requirement for life and evolution, Assembly Theory incorrectly proposes that it is the only feature that counts (pun intended), in order to define life. That this fabrication is inserted and repeated a hundred times, surrounded by sometimes trivially correct assertions, does not make it true.
I find it unfortunate that Walker has become something of a publicist for the proven flawed ideas of someone else, as she presents in this book a theory that is not hers but Lee Cronin’s. It seems as though they are feeding off each other’s limitations and weaknesses.
Walker even managed to irritate Cronin’s former collaborators by attributing to herself the idea about a Turing test for life (renamed as as a ‘test for agency’ or a ‘test for directionality’) that was not involved with and is not hers.
While I would like to believe that the shortcomings of this book are due to plain ignorance of the subject it covers, I cannot help but suspect willingness to deceive, given such behaviour and the fact that Sara Walker is familiar with fields like algorithmic probability, on which she has written about before, as well as being very well aware of our own work as a former collaborator (and guest at her group in Arizona). The fact that the book does not even mention or cite properly information theory, algorithmic complexity, or logical depth — concepts the authors of AT clearly use and draw inspiration from for their simplistic ideas — let alone any citation to our work and that of many others, suggests a degree of intentional dishonesty.
Unfortunately, I was left with a feeling that this book is yet another exercise in self- promotion rather than a scholarly contribution of any real substance.
To learn more about the many problems with Assembly Theory, and the authors’ modus operandi, visit my post on the 8 Fallacies of Assembly Theory.
Dr. Hector Zenil
Associate Professor/Senior Lecturer
School of Biomedical Engineering & Imaging Sciences
Faculty of Life Sciences and Medicine
& King’s Institute for Artificial Intelligence
King’s College London