Assembly Theory’s Five New Fallacies: Endless Deception in Complexity Theory and Computer Science

Dr. Hector Zenil

--

In recent years, Assembly Theory (AT) has garnered significant attention for its grandiose claims as a “Theory of Everything.”

Introduced by Leroy Cronin and Sara Walker, the theory promised groundbreaking insights into molecular complexity, proposing a novel metric for identifying life while ambitiously claiming to unify physics and biology.

The authors even suggested that AT could explain fundamental concepts like matter, time, and the universe itself.

However, rigorous examination has revealed that AT’s assembly index is formally equivalent to Shannon Entropy, as calculated by simple and well-known statistical compression algorithms from the LZ family, which were specifically designed to count repetitions in data. Far from delivering on its promises, AT has introduced more conceptual fallacies into computer science and complexity theory than any hypothesis in recent memory, offering a deeply flawed and redundant framework. While it resonates with individuals unfamiliar with the field, AT ultimately rebrands existing ideas in a manner that borders on sophisticated plagiarism and definitely merits the charge of pseudoscience.

Here’s a new article from KAUST Discovery covering one of our latest papers:

Here we examine five new fallacies recently perpetuated by Assembly Theory in their response to some of our latest criticisms:

We have already published two papers in peer-reviewed journals proving Assembly Theory wrong. First, in a paper published by a Nature journal, npj Systems Biology, we proved that all their main experiments could be reproduced with Shannon Entropy, LZ complexity and other algorithms. Then, in a second paper published by PLOS, we proved mathematically that their metric, assembly index, their assembly pathways and all definitions based on ‘copy number’ at the core of Assembly Theory or AT were equivalent to Shannon Entropy via an LZ encoding. All this remains unchallenged and the authors of AT have been unable to explain why their groundbreaking theory can be replicated by another algorithm, and why a mathematical proof showing full equivalence to Shannon Entropy and traditional statistical data compression is wrong.

Instead, in response to our third paper on the topic, where we show how the authors of Assembly Theory (AT) have misled their readers with incomplete experiments and red herrings in what amounts to gross scientific dishonesty, they have mounted a new defence based on a new series of fallacies. We will be updating our own paper in response. Here we offer a preview that we do not mind releasing because it debunks the grandiose claims made by the authors of Assembly Theory about their alleged Theory of Everything based on a metric in its infancy that’s barely distinguishable from 60-year old measures of complexity such as Shannon Entropy and LZ compression, which reproduce exactly the same results.

This is not a new tactic of Leroy Cronin, simply the latest in a string of such sleights of hand. For example, by creating the name ‘Chemputing’ for what is widely known as synthetic chemistry, digital chemistry, radiochemistry or chemoinformatics, he attempted to pass off as novel what literally every drug company has been doing for the last 50 years — exploring new compounds by automated means, following computer instructions in well-defined chemical computer languages. About 15 years ago he made a tall claim about being on the verge of creating life in the lab — a claim now tied to this new Assembly Theory, and seemingly used to mislead people and further advance a career built on what I call Cronin’s DPB: deception, plagiarism, and bullying modus operandi, which the University of Glasgow does nothing to stop because of Cronin’s success at bringing in tax payer money in the form of millions worth of research grants plus attention to Glasgow. We, however, believe in accountability, and so we undertake to disclose all this as a service to the community.

Here are the five new fallacies from their most recent reply to our most recent criticisms (which is yet another ruse, as they don’t disprove or even mention our mathematical proof or the fact that all their results can be trivially reproduced by naive statistical tools):

1. The new “We-Are-Different-Because-We-Are-Worse” Fallacy

Previously identified, one of the most baffling arguments presented by the authors of AT is their insistence that their algorithm is different because it operates in a different time complexity class. The authors of Assembly Theory double down on this argument.

While time complexity is an important metric in computer science, in science, for scientific purposes, the strength of a method lies in its explanatory power and predictive accuracy — not in its inefficiency. For example, there isn’t a definitive count of sorting algorithms because new ones can be created, but there are around 30 to 40 well-known ones. Quicksort is generally considered the fastest sorting algorithm, while Bubble sort is one of the slowest, yet both produce exactly the same output for the same input and both can sort the same types of data. From the point of view of input-output they represent the same function, but when implemented they indeed vary significantly when it comes to efficiency. The average time complexity of Bubble sort is O(n²) while the average complexity of Quicksort is O(n log n). They are in different time complexity classes but produce exactly the same results.

AT produces the same results as Shannon Entropy or Lempel-Ziv (LZ) compression, and we have demonstrated their equivalence with AT. However, AT achieves these results less efficiently and offers no additional explanatory power beyond what a Shannon Entropy-based explanation already provides.

Shannon Entropy was designed to deal with uncertainty and to count the number of elements repeated in an object. Everything that AT does with its assembly index, Shannon Entropy can do too. So, why would one apply Shannon Entropy with so many extra steps and then call it something different?

In a futile exercise of fake rigour, the authors go to great lengths to provide proofs of the NP-completeness of their assembly index (AI or Ai), as if computational complexity attested to their theory’s novelty or power. The core measure — identifying low-entropy patterns such as repeated elements in data — is something that complexity science has explored since the 1940s, starting with Erwin Schrödinger’s What Is Life? and culminating in many decades of work by other authors, from Shannon to Turing, Chaitin, Wolfram and our own work of over a decade on applying measures based on both Shannon Entropy and algorithmic complexity in areas ranging from genetics to selection to cell and molecular complexity, and beyond. In fact, we reported the same results as AT did but years before, and without claiming to have solved selection and evolution or to have unified biology and physics, because we did not.

If anything, this fallacy highlights AT’s redundancy rather than supporting its claims. The argument is akin to suggesting that a slower formula for calculating gravity among masses — producing results identical to Newtonian mechanics — somehow constitutes an alternative theory of gravitation, even in the presence of a better one such as Einstein’s General Relativity. For the authors of AT, it seems that being worse, slower, and redundant can be considered strengths, simply by virtue of believing very hard and claiming, with no justification, that AT could characterise selection and evolution, like someone focusing really hard to achieve telekinesis.

2. The Fallacy of Marginal Difference and Scatter Correlation

Another key flaw in AT lies in its reliance on inconsistent heuristics, which change from paper to paper. These heuristic tweaks, designed to maintain marginal differences from Shannon Entropy or LZ compression, amount to shifting goalposts. Despite theoretical and empirical proofs demonstrating that the assembly index is fundamentally equivalent to Shannon entropy, the authors persist in claiming novelty by pointing to small scatter deviations in data. These deviations, however, arise not from theoretical innovation but from arbitrary algorithmic inefficiencies.

For instance, they argue that a few outliers among thousands of data points validate the uniqueness of their approach. Yet, as datasets grow larger, these outliers diminish proportionately, converging inevitably on the same results as Shannon entropy or LZ compression (see Table below). The authors even misuse statistical concepts, failing to recognise that a Spearman correlation approaching 1 signifies convergence, not divergence, as per their reality-bending arguments.

The authors of AT suggest that a Spearman correlation value of 1 is not real convergence.

The false argument on the different Time Complexity Class

Their insistence that AT is different because it operates inefficiently within a different time complexity class does not rewrite the rules of computer science, data science, mathematics, statistics or complexity science. Convergence still means convergence and a mathematical proof (Supp. Inf.) still holds even if the authors of AT wished it did not, preferring to ignore it.

Scientific theories that diverge in foundational substance are expected to produce different results, superior explanations or better predictions, none of which AT does. It may have some pedagogical value, but it lacks any fundamental or methodological value. AT fails, and for promotional purposes its authors rely on arbitrary heuristics that mimic well-established statistical algorithms.

Unfortunately we don’t think AT has any value when it comes to evolution and selection (life, time, matter and the universe) or as a Theory of Everything unifying physics and biology, despite the authors’ claims. It only muddles things because people may think it does what the authors say it does despite its adding nothing to the discussion of evolution and selection that was not already known or could be derived by means of Shannon Entropy alone. We already knew that matter accumulates and uses a small number of building blocks again and again, and that evolution and selection reuses stuff too, from genes to proteins to cells, and that cells are highly modular. Yet nothing about AT equips it to define life because there are equally sophisticated physical processes that produce objects it would identify as alive. Life needs to be defined relative to an environment, as with selection and evolution itself.

Epitomising one of the many contradictions in their work, their original paper, which lays the foundations for everything else they have done and published, features beer as the molecular compound with the highest assembly index. When challenged they say that beer is the product of human technology, therefore plausibly possessed of a high Ai. However, in this very paper, whiskey figures as the molecular compound with the lowest Ai . This type of contradiction is found all over the authors’ papers, and everytime they come out with another argument in their defence, they introduce yet another inconsistency that leaves those trying to understand their contributions simply astonished. Yet instead of trying to save face, they always double down.

3. The Metrology Fallacy: A Misunderstanding of Metrics and Lack of Evidence

The authors of Assembly Theory suggest that we do not understand metrology because their assembly index is a measure. This seems to imply that Shannon entropy is not, and that compression ratios widely used as metrics are not metrics for quantifying complexity. Shannon entropy, Lempel-Ziv (LZ) compression, and similar measures have been extensively used in metrology for decades to quantify complexity in systems ranging from spam detection to phylogenetic reconstruction. These metrics are well-established, rigorously tested, and have contributed significantly to scientific understanding. The claim that AT introduces something novel in this domain is not supported by any evidence or even by common sense.

In their defence, the authors claim that they are not only proposing a measure of complexity but one that measures selection and evolution. However, as explained before, they have no evidence whatsoever that their assembly index quantifies selection or evolution in a way that is different or superior to existing measures, particularly Shannon entropy — or in any other way for that matter. Repeating ad infinitum that they can measure selection and evolution does not make it true. They have no evidence for this.

This is especially damning given that we have demonstrated both empirically and mathematically that their results can be fully reproduced using naive statistical tools such as Shannon entropy and LZ compression. The authors have consistently refused to conduct control experiments that would compare their measure to established alternatives. Across all their papers, they fail to provide any such comparisons, leaving the burden of proof unaddressed.

Instead of defending their measure at all costs through hand-waving arguments, the authors should demonstrate what their assembly index can achieve what other measures cannot, particularly in relation to selection and evolution. They should clarify the specific advantages of their approach and provide empirical evidence connecting their metric to these biological processes that the other measures cannot reproduce. It is false to claim that AT offers better or even different explanations of evolution, selection or life than Shannon Entropy. So far AT has offered neither a different explanation nor better predictions, as the authors refuse to compare in their papers the application of Shannon Entropy or traditional compression to their data. Instead, we have been forced to do so for them, in order to complete their experiments that would otherwise not just be incomplete but dishonestly incomplete.

What they should be focused on is addressing the mathematical proof we presented, which shows that their assembly index (Ai), assembly pathways, and all associated ideas based on counting identical copies in experimental data are mathematically equivalent to Shannon entropy. Ignoring this equivalence while continuing to claim uniqueness does a disservice to the scientific community.

Ultimately, AT’s inability to demonstrate superiority over existing measures, combined with its refusal to engage in rigorous comparisons, undermines its validity. Its claims to measure complexity in a novel way remain unsupported, and its insistence on dismissing well-established metrics such as Shannon entropy and LZ compression on the basis that they explain selection and evolution — based on zero evidence — is staggering. Far from being innovative, AT is a redundant rebranding of concepts that have been explored and refined for decades which these authors do not cite, refusing to acknowledge the work of my team, which has not courted such intense media attention, being responsible scientists.

Its authors supply no evidence that Assembly Theory and its index are more related to evolution or selection than any other measure. Their measure happens to be weaker, slower and redundant when Shannon Entropy or LZ compression is applied to their own data and at whatever level of description they may want to apply it. Indeed, they complain that Shannon Entropy or LZ compression can only be applied to bits, but this is wrong; it can be applied to any object and on any basic units, vertices in a network, peaks from a mass spectral file, etc.

4. Turning the Tables Fallacy (similar to the Straw man)

The authors of AT have suggested that the statistical algorithms we utilised in our last paper only succeed because we are not controlling for molecular length. However, this is precisely our criticism of their approach, as demonstrated in this paper. Our findings show that their measure is entirely dominated by molecular length, which is what their alleged separation between organic and non-organic compounds was actually measuring. It defies logic that the authors now appear to be turning the tables by accusing us of the very oversight we drew their attention to.

Specifically, they failed to control for the most basic features and failed to compare their index to other measures of complexity. Had they done so, they would have immediately found out that their whole groundbreaking theory was unable to produce any new results and that the results were entirely driven by Shannon Entropy and compression techniques, reported before, that are able to separate organic from non-organic compounds.

They did the same when we accused them of using a popular statistical compression disguised as an ‘assembly index,’ turning the tables and accusing us of trying to suggest that statistical data compression was adequate or better than Ai at characterising life, insisting that it was trivial and unable to characterise life. This is exactly our argument: that Ai cannot characterise life because it is simply a traditional statistical compression algorithm that is too trivial. While such an algorithm has been used before to characterise life and life processes, the field of complexity has moved on from it, as the idea has been exhausted or well understood in the sense that it characterises Shannon Entropy.

Yet, somehow the authors managed to reverse the criticism, pretending that their Ai measure is different from and hence dismissing Shannon Entropy and ZIP data compression as trivial and unable to characterise life when we have proven that Ai, at the core of AT, is Shannon Entropy and statistical data compression.

In another example, I have been told that Sara Walker has even suggested on her X social media account that scientists should use their agency to think out of the box because something as ‘trivial’ as classical Information Theory cannot characterise life while Assembly Theory can. However, since Assembly Theory is equivalent to Shannon Entropy, which is the foundational measure of classical Information Theory, this means that Sara Walker is clearly suggesting their Assembly Theory with its assembly index is unequipped to deal with life. Somehow they obscure half of the logical argument and use the other half in their defence, the part where we are saying the same thing (though they think Assembly Theory is different from classical Information Theory, which we have proven not to be true, and they have failed to show any empirical or fundamental difference, other than their flawed time complexity argument, that is, that they are too slow to be Shannon entropy).

5. The Data Representation Fallacy

They also claim that Shannon Entropy and LZ compression can only be applied to bits or it would destroy the units that they think are fundamental, such as chemical bonds. This is totally false, Shannon Entropy and LZ or any data compression algorithm can be used on any dictionary without destroying any basic units. Indeed, Entropy can and has been applied to edges of networks, or any other basic units, just as compression algorithms have been widely used as complexity measures.

Shannon, in his foundational work on information theory, showed that:

1. Information content (entropy) is invariant under lossless transformations: If you encode data using a different representation or “vocabulary” (e.g., from text to binary), the information content remains unchanged as long as the transformation is reversible. This is always the case when data has a computer representation, such as a file ultimately stored as binary on a hard disk. This means that patterns in the original data are preserved under transformations if they are reversible.

2. Patterns and redundancy in data are key to compression: Shannon’s theory demonstrated that patterns in data (redundancy) could be exploited to reduce the size of data representations without losing information, provided the encoding is efficient, but also that the converse is true, that different representations cannot erase a pattern without adding or deleting data. Any vocabulary-mapping does not destroy the data. In other words, even if chemical bonds are decomposed into a binary representation (e.g., InChI or SMILES codes), the patterns (or lack thereof) remain as long as there is a way to convert one representation into another.

Fano, one of Shannon’s collaborators, developed the Fano coding scheme, which was a precursor to more efficient coding methods like Huffman coding (which is why we’ve suggested from day one, many years ago, that AT was equivalent to Huffman’s coding).

The principle that patterns cannot be destroyed hinges therefore on the reversibility and preservation of structure in transformations:

· When a transformation (or mapping to another vocabulary) is lossless, the patterns in the data are preserved, as the transformation retains all original information.

Shannon’s information theory underpins this idea because it establishes that:

1. Entropy is a fundamental measure of information, unaffected by how the data is represented.

2. Transformations that retain entropy preserve the data’s inherent patterns, regardless of the vocabulary or representation used.

Therefore, the claims that AT or its assembly index can only deal with chemical data at the right level of description, and that the other measures cannot, or that if the other measures are applied to different levels of description the results are different, are both false and a fallacy.

We have proven that an object, like a chemical compound, that has low Shannon entropy will have high Ai, and an object with low Ai will have high Shannon entropy and vice versa. This is because Ai is fully and trivially equivalent to Shannon and unable to distinguish and quantify any pattern that Shannon entropy is not already able to distinguish and quantify, and they will also fail exactly in the same cases, producing false positives and false negatives for trivial cases beyond trivial pattern-matching statistics.

Alternative approaches better founded than AT: The Coding and Block Decomposition Methods

Where AT falters, methods like the Block Decomposition Method, introduced about a decade earlier, offer a more rigorous and grounded approach. Here is a paper fully exploring many of its features and placing it appropriately in the hierarchy of complexity measures in relation to Shannon entropy, compression and algorithmic complexity.

By integrating principles of algorithmic complexity, which is semi-computable and therefore implementable, this method surpasses traditional correlation metrics, enabling a deeper understanding of complexity and causation. Unlike AT, which remains mired in redundancy and inefficiency, the Block Decomposition Method builds on established theories to deliver meaningful insights into the dynamics of complex systems in a way that is better grounded in causal inference by way of perturbation analysis and algorithmic probability. In contrast to AT, BDM openly admits to standing on the shoulders of Shannon Entropy (worst case behaviour) and on the principles of algorithmic probability (best case behaviour when enough computational resources are devoted to generate increasingly better estimations of causal content). Methodological approaches like the Block Decomposition Method provide formal and robust foundations better equipped to understanding the very phenomena that AT pretends to explain, such as selection, and evolution, and they are grounded in science (causality, algorithmic complexity, information theory), and not media spectacle.

The Triviality of Assembly Theory

At its core, Assembly Theory collapses under the weight of its triviality. The notion that life is related to redundancy and low-entropy has been explored by complexity scientists. Indeed the field arose as an attempt to tackle these questions. AT has not done anything not reported before, and reported with due scientific decorum and honesty.

The work of the authors of AT has led to demonstrable high-school-level errors in their earlier work, and to further misunderstandings of computability and complexity theory. Their inability to reconcile these errors has led to a staggering number of corrections per argument, a phenomenon aptly captured by Brandolini’s law: the effort to refute misinformation is exponentially greater than the effort to produce it. Every time they try to build another argument in their defence, they only end up adding more self-inconsistencies and an endless list of fallacies difficult to keep up with.

If you wish to know more about the many fallacies and dishonest practices of the group behind Assembly Theory, read my previous posts:

· The 8 Fallacies of Assembly Theory

· Assembly Theory: Pseudoscience Masquerading as Revolutionary Insight Exposed

· Life as Everybody Knows it. Book Review: ‘Life As No One Knows It’ by Sara Imari Walker

Crackpot theories need these 3 to 4 things to thrive: sophistication (jargon, fake rigour), authority (fancy credentials), attention (good and bad press), & charisma (confident rebels). Assembly Theory and their authors have them all.

Addendum: In a recent interview with Closer to the Truth, Sara Walker now says that Assembly Theory can explain free will. This reflects a fundamental mistake — taking a model so seriously as to believe it corresponds to reality. They don’t even qualify their claims with ‘if Assembly Theory is correct.’

Even if it were correct, it would mean little, as I have written before, because it is essentially a rebranding of Shannon Entropy with experiments that have already been performed and reported. Now, in addition to time, matter, life, and everything else, we must add free will to the list of things Leroy Cronin and Sara Walker claim Assembly Theory, as a New Theory of Everything, is capable of explaining, based on the utterly trivial definition of counting copies of the same thing, which can explain concatenation and join operations but nothing else. This is, according to them, the supposed key to everything.

Pseudo-academics like Lee Cronin and Sara Walker are damaging science and scientific journalism, contributing to a high-tech dark age rife with fake news and disinformation that has unfortunately permeated science in a post-truth era.

--

--

Dr. Hector Zenil
Dr. Hector Zenil

Written by Dr. Hector Zenil

Associate Professor King’s College London. Former Senior Researcher & Faculty Member @Oxford U., Alan Turing Institute & Chemical Eng & Biotech @Cambridge U.

Responses (1)