Assembly Theory’s Five New Fallacies: Endless Deception in Complexity Theory and Computer Science

Dr. Hector Zenil
16 min readDec 11, 2024

--

In recent years, Assembly Theory (AT) has garnered significant attention for its grandiose claims as a “Theory of Everything.”

Introduced by Leroy Cronin and Sara Walker, the theory promised groundbreaking insights into molecular complexity, proposing a novel metric for identifying life while ambitiously claiming to unify physics and biology.

The authors even suggested that AT could explain fundamental concepts like matter, time, and the universe itself.

However, rigorous examination has revealed that AT’s assembly index is formally equivalent to Shannon Entropy, as calculated by simple and well-known statistical compression algorithms from the LZ family, which were specifically designed to count repetitions in data. Far from delivering on its promises, AT has introduced more conceptual fallacies to computer science and complexity theory than any hypothesis in recent memory, offering a deeply flawed and redundant framework. While it resonates with individuals unfamiliar with the field, AT ultimately rebrands existing ideas in a manner that borders on sophisticated plagiarism and definitely falls into pseudoscience.

Here a new article from KAUST Discovery covering one of our latest papers:

Here we examine five new fallacies recently perpetuated by Assembly Theory in their response to some of our latest criticisms:

We have already published two papers in peer-reviewed journals proving Assembly Theory wrong. First, we proved that all their main experiments could be reproduced with Shannon Entropy, LZ complexity and other algorithms in a paper published by a Nature journal npj Systems Biology. Then, in a second one published by PLOS, we proved mathematically that their metric, assembly index, their assembly pathways and all definitions based on ‘copy number’ at the core of Assembly Theory or AT were equivalent to Shannon Entropy via an LZ encoding. All this remains unchallenged and the authors of AT have been unable to point our why their groundbreaking theory can be replicated by any other algorithm, and how a mathematical proof showing full equivalence to Shannon Entropy and traditional statistical data compression is wrong.

Instead, in response to our third paper in the topic, where we show how the authors of Assembly Theory (AT) have misled their readers with incomplete experiments and red herrings in what amounts to gross scientific dishonesty, they have mounted a new defence based on a new series of fallacies. We will be updating our own paper in response. However, we here offer a first view that we do not mind releasing because the authors of Assembly Theory have been cornered from making grandiose claims about their alleged Theory of Everything to a metric in its infancy barely able to being distinguished from 60-year old measures of complexity such as Shannon Entropy and LZ compression that reproduce exactly the same results.

This is not new in the known old tactics of Leroy Cronin who has done it in multiple occasions before. For example, by creating the name ‘Chemputing’ for what is widely known as synthetic chemistry, digital chemistry radiochemistry or chemoinformatics, what literally every drug company has been doing for the last 50 years to explore new compounds by automated means following computer instructions in well defined chemical computer languages. He also made a similar claim 15 years ago about being on the verge of creating life in the lab — a claim now tied to this new Assembly Theory, seemingly used to mislead people and further advance a career built on what I call Cronin’s DPB: deception, plagiarism, and bullying modus operandi, that the University of Glasgow does not stop because Cronin’s success at bringing tax payer’s money in the form of millionaire research grants and attention to Glasgow. Yet, we believe in accountability and disclose all this as a service to the community.

Here are the new five fallacies from their most recent reply to our most recent criticisms (which is only yet another red herring as they don’t disprove or even mention our mathematical proof or the fact that all their results can be trivially reproduced by naive statistical tools):

1. The new “We-Are-Different-Because-We-Are-Worse” Fallacy

Previously identified, one of the most baffling arguments presented by the authors of AT is their insistence that their algorithm is different because it operates in a different time complexity class. The authors of Assembly Theory double down on this argument.

While time complexity is an important metric in computer science, in science, for scientific purposes, the strength of a method lies in its explanatory power and predictive accuracy — not in its inefficiency. For example, There isn’t a definitive count of sorting algorithms because new ones can be created, but there are around 30 to 40 well-known ones. Quicksort is generally considered the fastest sorting algorithm, while bubble sort is one of the slowest yet both produce exactly the same output for the same input and both can sort the same type of data. From the point of view of input-output they represent the same function, but when implemented they indeed vary significantly and when it comes to its efficiency. The average time complexity of Bubble sort is O(n²) while the average complexity of Quicksort is O(n log n). They are in different time complexity classes but produce exactly the same results.

AT produces the same results as Shannon Entropy or Lempel-Ziv (LZ) compression, and we have demonstrated their equivalence with AT. However, AT achieves these results less efficiently and offers no additional explanatory power beyond what a Shannon Entropy-based explanation already provides.

Shannon Entropy was designed to deal with uncertainty and to count the number of elements repeated in an object, everything that AT does with its assembly index, Shannon Entropy can do it. So, why would one apply Shannon Entropy with so many extra steps and then calling it differently?

The authors go to great lengths to provide proofs of the NP-completeness of their assembly index (AI or Ai), in a futile exercise of fake rigour as if computational complexity justified their theory’s novelty or power. The core measure — identifying low-entropy patterns such as repeated elements in data — is something that complexity science has explored since the 1940s, starting with Erwin Schrödinger’s What Is Life? and culminating in decades of work by other authors from Shannon to Turing, Chaitin, Wolfram and our own work on applying measures based on both Shannon Entropy and algorithmic complexity for over a decade on areas ranging from genetics to selection to cell and molecular complexity, and more. In fact, we reported the same as AT did but years before and without claiming to have solved selection and evolution or to have unified biology and physics, because it does not.

If anything, this fallacy highlights AT’s redundancy rather than supporting its claims. The argument is akin to suggesting that a slower formula for calculating gravity among masses — producing identical results to Newtonian mechanics — somehow constitutes an alternative theory of gravitation, even in the presence of a better one such as Einstein’s General Relativity. For the authors of AT, it seems that being worse, slower, and redundant is considered a virtue on the basis of believing very hard and claiming on no basis that it can characterise selection and evolution, like someone focusing really hard to achieve telekinesis.

2. The Fallacy of Marginal Difference and Scatter Correlation

Another key flaw in AT lies in its reliance on inconsistent heuristics, which change from paper to paper. These heuristic tweaks, designed to maintain marginal differences from Shannon Entropy or LZ compression, amount to shifting goalposts. Despite theoretical and empirical proofs demonstrating that the assembly index is fundamentally equivalent to Shannon entropy, the authors persist in claiming novelty by pointing to small scatter deviations in data. These deviations, however, arise not from theoretical innovation but from arbitrary algorithmic inefficiencies.

For instance, they argue that a few outliers among thousands of data points validate the uniqueness of their approach. Yet, as datasets grow larger, these outliers diminish in proportion, converging inevitably on the same results as Shannon entropy or LZ compression (see Table below). The authors even misuse statistical concepts, failing to recognise that Spearman correlation approaching 1 signifies convergence, not divergence in their reality-bending arguments.

The authors of AT suggest that Spearman correlation value of 1 is not real convergence.

Their insistence that AT is different because it operates inefficiently within a different time complexity class does not rewrite the rules of computer science, data science, mathematics, statistics or complexity science. Convergence still means convergence and a mathematical proof (Sup Inf.) still holds even if the authors of AT wish it did not, and decide to ignore it.

Scientific theories that diverge in foundational substance are expected to produce different results, better explanations or better predictions, none of which AT does except for perhaps pedagogical value, not fundamental or methodological. AT fails, relying instead on arbitrary heuristics that mimic well-established statistical algorithms for personal promotion.

Unfortunately we don’t think AT has any value when it comes to evolution and selection (life, time, matter and the universe) or as a Theory of Everyting unifying physics and biology as the authors have claimed. It only muddles things because people may think it does what the authors say it does, but in no way is adding anything to the discussion of evolution and selection that was not already known or could be derived by means of Shannon Entropy alone. We already knew that matter accumulates and uses a small number of building blocks again and again, and that evolution and selection reuses stuff too, from genes to proteins to cells, and that the cells are highly modular. Yet nothing of AT can define life because there are physical processes equally sophisticated that produce objects they would identify as alive. Life requires to be defined relative to an environment as per selection and evolution itself.

To exemplify one of their many contradictions, according to their original paper that builds the foundations for everything else they have done and published, beer is the molecular compound with highest assembly index. When challenged they say that beer is the product of human technology therefore justified to have a high Ai. However, Whiskey is the molecular compound with lowest Ai in their own paper. This type of contradiction is found multiple times all over the authors’ papers, and every time they come out with another argument in their defence, they introduce another inconsistency that leaves those trying to understand their contributions simply astonished yet instead of trying to save face, they always triple down.

3. The Metrology Fallacy: A Misunderstanding of Metrics and Lack of Evidence

The authors of Assembly Theory suggest that we do not understand metrology because their assembly index is a measure. This seems to imply that Shannon entropy was not, and that compression ratios widely used as metrics are not metrics for quantifying complexity. Shannon entropy, Lempel-Ziv (LZ) compression, and similar measures have been extensively used in metrology for decades to quantify complexity in systems ranging from spam detection to phylogenetic reconstruction. These metrics are well-established, rigorously tested, and have contributed significantly to scientific understanding. The claim that AT introduces something novel in this domain is not supported by any evidence or even common sense.

In their defence, the authors claim that they are not only proposing a measure of complexity but one that measures selection and evolution. However, as explained before, they have no evidence whatsoever that their assembly index quantifies selection or evolution in a way that is different or superior to existing measures, particularly Shannon entropy, or in any other way. Repeating ad infinitum that they can measure selection and evolution does not make it true. They have no evidence for this.

This is especially damning given that we have demonstrated both empirically and mathematically that their results can be fully reproduced using naive statistical tools such as Shannon entropy and LZ compression. The authors have consistently refused to conduct control experiments that would compare their measure to established alternatives. Across all their papers, they fail to provide any such comparisons, leaving the burden of proof unaddressed.

Instead of defending their measure at all costs through hand-waving arguments, the authors should demonstrate what their assembly index can achieve that other measures cannot, particularly in relation to selection and evolution. They should clarify the specific advantages of their approach and provide empirical evidence connecting their metric to these biological processes that the other measures cannot reproduce. It is false that AT offers better or even different explanation of evolution, selection or life than Shannon Entropy. Neither a different explanation nor better predictions is what AT has offered so far as they refuse to compare in their papers the application of Shannon Entropy or traditional compression to their data. Instead, we have been forced to do so for them, in order to complete their experiments that otherwise are not only incomplete but dishonestly incomplete.

What they should be focused on is to address the mathematical proof we presented, which shows that their assembly index (Ai), assembly pathways, and all associated ideas based on counting identical copies in experimental data are mathematically equivalent to Shannon entropy. Ignoring this equivalence while continuing to claim uniqueness does a disservice to the scientific community.

Ultimately, AT’s inability to demonstrate superiority over existing measures, combined with its refusal to engage in rigorous comparisons, undermines its validity. Its claims to measure complexity in a novel way remain unsupported, and its insistence on dismissing well-established metrics such as Shannon entropy and LZ compression on the basis that they explain selection and evolution based on no evidence, is staggering. Far from being innovative, AT is a redundant rebranding of concepts that have been explored and refined for decades that these authors do not cite, refusing to acknowledge the work of my team that have not gathered such media attention only because we are responsible scientists.

Assembly Theory and their index do not have any evidence to suggest it to be more related to evolution or selection than any other measure, but their measure is weaker, slower and redundant when taking Shannon Entropy or LZ compression applied to their own data and to whatever level of description they may want to apply it. Indeed, they complain that Shannon Entropy or LZ compression can only be applied to bits, but this is wrong, it can be applied to any object and on any basic units, can be vertices in a network, or peaks from a mass spectral file, etc.

4. Turning the Tables Fallacy (similar to the Straw man)

The authors of AT suggested that the statistical algorithms we utilised in our last paper only succeed because we are not controlling for molecular length. However, this is precisely our criticism of their approach, as demonstrated in this paper. Our findings show that their measure is entirely dominated by molecular length, which is what their alleged separation between organic and non-organic compounds was actually measuring. It defies logic that the authors now appear to be turning the tables by accusing us of the very oversight we have demonstrated they failed to address.

Specifically, they failed to control for the most basic features and failed to compare their index to other measures of complexity. Had they done it, they would have immediately found out that their whole groundbreaking theory was unable to produce any new results and that the results were entirely driven by Shannon Entropy and compression techniques reported before able to separate organic from non-organic compounds.

They did the same when we accused them of using a popular statistical compression disguised as an ‘assembly index’ reversing the table and accusing us of trying to suggest that statistical data compression was enough or better than Ai at characterising life saying it was trivial and unable to characterise life. This is exactly our argument, that Ai cannot characterise life because is simply a traditional statistical compression algorithm that is too trivial and has been used before to characterise life and life processes and the field of complexity has moved on from as the idea has been exhausted or well understood in the sense that it characterises Shannon Entropy.

Yet, somehow the authors managed to reverse the criticism. Instead, the authors of AT pretend their Ai measure is different from and discard Shannon Entropy and ZIP data compression as trivial and unable to characterise life when we have proven that Ai at the core of AT is Shannon Entropy and statistical data compression.

In another example, I have been told that Sara Walker has even suggested in her X social media account that scientists should use their agency to think out of the box because something as ‘trivial’ as classical Information Theory cannot characterise life and Assembly Theory can. However, Assembly Theory being equivalent to Shannon Entropy which is the foundational measure of classical Information Theory therefore means that Sara Walker is clearly suggesting their Assembly Theory with assembly index to be unequipped to deal with life. Yet, somehow they keep only half of the logical argument and use it in their defence where we are saying the same only that they think Assembly Theory is different to classical Information Theory which we have proven not to be and they have failed to show any empirical or fundamental difference (other than their flawed time complexity argument, that is, that they are too slow to be Shannon entropy).

5. The Data Representation Fallacy

They also claim that Shannon Entropy and LZ compression can only be applied to bits or would destroy the units that they think are fundamental, such as chemical bonds. This is totally false, Shannon Entropy and LZ or any data compression algorithm can be used on any dictionary without destroying any basic units. Indeed, one can apply, and has been applied, Entropy to edges of networks, or any other basic units, the same for compression algorithms that have been widely used as complexity measures.

Shannon, in his foundational work on information theory, showed that:

  1. Information content (entropy) is invariant under lossless transformations: If you encode data using a different representation or “vocabulary” (e.g., from text to binary), the information content remains unchanged as long as the transformation is reversible. This is always the case when data has a computer representation, such as a file ultimately stored as binary on a hard disk. This means that patterns in the original data are preserved under transformations if they are reversible.
  2. Patterns and redundancy in data are key to compression: Shannon’s theory demonstrated that patterns in data (redundancy) could be exploited to reduce the size of data representations without losing information, provided the encoding is efficient but also that the converse is true, that different representations cannot erase a pattern without adding or deleting data. Any vocabulary-mapping does not destroy the data. In other words, even if chemical bonds are decomposed into a binary representation (e.g., InChI or SMILES codes), the patterns (or lack thereof) remain as long as there is a way to convert one representation into another.

Fano, a collaborator of Shannon, developed the Fano coding scheme, which was a precursor to more efficient coding methods like Huffman coding (hence why we suggested from day one, many years ago, that AT was equivalent to Huffman’s coding).

The principle that patterns cannot be destroyed hinges therefore on the reversibility and preservation of structure in transformations:

  • When a transformation (or mapping to another vocabulary) is lossless, the patterns in the data are preserved, as the transformation retains all original information.

Shannon’s information theory underpins this idea because it establishes that:

  1. Entropy is a fundamental measure of information, unaffected by how the data is represented.
  2. Transformations that retain entropy preserve the data’s inherent patterns, regardless of the vocabulary or representation used.

Therefore, the claims that AT or its assembly index can only deal with chemical data at the right level of description, and that the other measures cannot, or that if the other measures are applied to different levels of description is different, are both false and a fallacy.

We have proven that an object, like a chemical compound, that has low Shannon entropy will have high Ai, and an object with low Ai will have high Shannon entropy and the other way around. This is because Ai is fully and trivially equivalent to Shannon and unable to distinguish and quantify any pattern that Shannon entropy is not already able to distinguish and quantify, and they will also fail exactly in the same cases producing false positives and false negatives for trivial cases beyond trivial patter-matching statistics.

Alternative approaches better founded than AT: The Coding and Block Decomposition Methods

Where AT falters, methods like the Block Decomposition Method introduced about a decade earlier offer a more rigorous and grounded approach. Here is a paper fully exploring many of its features and placing it exactly in the hierarchy of complexity measures in connection with Shannon entropy, compression and algorithmic complexity.

By integrating principles of algorithmic complexity, which is semi-computable and therefore implementable, this method surpasses traditional correlation metrics, enabling a deeper understanding of complexity and causation. Unlike AT, which remains mired in redundancy and inefficiency, the Block Decomposition Method builds on established theories to deliver meaningful insights into the dynamics of complex systems better grounded in causal inference by way of perturbation analysis and algorithmic probability. In contrast to AT, BDM openly admits to stand on the shoulders of Shannon Entropy (worse case behaviour) and on the principles of algorithmic probability (best case behaviour when enough computational resources are devoted for increasingly better estimations of causal content). Methodological approaches like the Block Decomposition Method provide formal and robust foundations better equipped to understanding the very phenomena that AT pretends to explain, such as selection, and evolution — grounded in science (causality, algorithmic complexity, information theory), and not media spectacle.

The Triviality of Assembly Theory

At its core, Assembly Theory collapses under the weight of its triviality. The notion that life is related to redundancy and low-entropy was explored by complexity scientists for decades and actually was what gave raise to it. AT has not done anything not reported before but that has been reported with a minimum of scientific decorum and honesty.

The work of the authors of AT has led to demonstrable high-school-level errors in their earlier work, and to further misunderstandings of computability and complexity theory. Their inability to reconcile these errors has led to a staggering number of corrections per argument, a phenomenon aptly captured by Brandolini’s law: the effort to refute misinformation is exponentially greater than the effort to produce it. Every time they try to build another argument in their defence, they only end up adding more self-inconsistencies and an endless list of fallacies difficult to keep up with.

If you wish to know more about the many fallacies and dishonest practices of the group behind Assembly Theory, read my previous posts:

--

--

Dr. Hector Zenil
Dr. Hector Zenil

Written by Dr. Hector Zenil

Associate Professor King’s College London. Former Senior Researcher & Faculty Member @Oxford U., Alan Turing Institute & Chemical Eng & Biotech @Cambridge U.

No responses yet