The 8 Fallacies of Assembly Theory

Dr. Hector Zenil
115 min readMay 1, 2023

--

Regarding the alleged misunderstandings reflected in our critique of ‘Assembly Theory,’ wherein we drew attention to the many serious issues undermining its foundations and methods.

Click here to learn about the latest developments and how we have proven Assembly Theory wrong in three new papers, two of which have now been published by nature and PLOS journals.

Or here for another list of new fallacies by Assembly Theory in their attempt to defend their previous mistakes.

Original post:

Assembly Theory is a hypothesis that claims to explain selection and evolution and to redefine time, life, and the universe. The authors and some media have gone as far as calling it a Theory of Everything…

In this contribution, you will learn that we reported findings that performed similarly or outperformed those of Assembly Theory (AT) using traditional statistical measures on the spectral data used by the authors of AT as empirical evidence in favour of their claims on which they built a whole narrative in support of their hypothesis. We also found the exact equivalence between Assembly Theory (and its algorithms) and Shannon Entropy, as AT is only concerned with keeping track of how many identical copies are in an object of interest. This is exactly the definition and purpose of Shannon Entropy and not different from algorithms such as ZIP and PNG.

You will also learn that many years before they did — in 2017 vs. 2021 — we reported separating organic from nonorganic compounds using compression algorithms to different types of chemical data, including InChi and molecular distance matrices containing the same kind of information in the spectral data used by AT (and later also confirmed using their data). We did so without making bold, unjustified claims and employing a sound and responsible approach, which included testing against a complete molecular database with more than 15,000 compounds, as against the 131 compounds picked by the authors of AT, plus a proper literature review, control experiments (comparisons to other measures), and indexes more deeply connected to causality, including but going beyond those that count only identical copies of blocks in data that AT considers. We called our method Block Decomposition Method and was introduced in the mid 2010s based on methods we developed in the early 2010s.

Additionally, you will learn that in our 2017 Royal Society paper, which the AT authors failed to cite (knowing about it), just as they have failed to cite all other relevant work, five years ahead of their 2023 paper on this topic, we connected complexity, including copy number (the only type of process considered by AT), to selection and evolution using our Block Decomposition Method. We did so using empirical data, including genetic data drawn from the clinical literature on cancer, quite unlike the synthetic illustrations used by the AT authors in their Nature paper. Our work stands in contrast to the basic statistical methods that AT defines and employs (their assembly index and assembly number), which we have proven are an approximation to algorithmic complexity equivalent to LZ compression. Both their theory and their algorithm describe the same compression algorithm.

Indeed, based on Assembly Theory’s definition of the shortest path, which in turn is based upon the high and low copy number criterion driving its pathway complexity, the assembly number, and the assembly index, we have proven that an object with a low assembly index will have low Shannon Entropy and high LZ compressibility and an object with a high assembly index will have high Shannon Entropy and low LZ compressibility, all in direct proportion and driven by their copy-counting feature tested on the elements of their choice (chemical bonds or other entities).

In other words, by applying Shannon Entropy or LZ (they never make the comparison) to molecular or chemical data, including the mass spectral data from their 2021 paper, the same results are obtained (link to code and data published in the paper for everyone to reproduce). No properties other than those already captured by Shannon Entropy or statistical compression algorithms as approximations to algorithmic (Kolmogorov) complexity are captured by the theory or the algorithms of Assembly Theory. This means that Assembly Theory (AT), Shannon Entropy, and the LZ compression algorithm are indistinguishable from one another, despite the authors’ attempts to distance themselves from these other measures and their claim that these other approaches are incapable of dealing with their type of data or the type of properties they capture, such as ‘structure’ or anything that purportedly cannot be characterised by Shannon Entropy or LZW.

This is in contrast to other indexes, such as our Block Decomposition Method (BDM) which takes into account local patches of a non-statistical nature, which can be traced back to our first papers published in the period 2007–2012 connecting complexity and life, evolution and selection and demonstrating how to decompose physical and biological data into causal blocks based on the number of repetitions of low algorithmic probability patches, and thus how to use BDM for biosignature detection.

Note that before making any of this information public, we kindly and respectfully approached the authors of AT on multiple occasions, as early as 2017 and earlier, even offering advice gratis, with no expectation of credit. Yet not only did they decline, but having corrected just a few errors in their 2017 paper that became too obvious once I’d pointed them out (correspondence available upon request), we were met with insults. We have tried to focus our criticisms on the content of AT and the way authors conduct science or communicate their results, which we think are valid and legitimate concerns.

We have argued that the authors’ marketing and promotional activities, deployed in service of what we think is a fallacious concept and a poorly examined methodology, are unfortunate and scientifically irresponsible.

One of the main thrusts of our critique and our primary take-home message may be gleaned from this figure (MA is the Assembly Theory measure; all others are traditional statistical measures that the authors of AT never tested against MA as alternative complexity measures):

Taken from our paper, available online, this figure shows how basic statistical measures applied to the same experimental spectral data do precisely what the authors of Assembly Theory (AT) claim they have done for the first time. MA is their molecular assembly measure. The Y axis is on a renormalised log scale for visualisation purposes, allowing side-by-side comparison among the different indexes. As AT suggests, any of the separating cut-off values on the Y axis can serve as thresholds for life. This figure demonstrates the need for basic control experiments in the papers on AT, and the absence of methodological originality (contra the authors’ public claims) as opposed to perhaps pedagogical value (provided they accept and embrace the ideas and methods they are instantiating instead of baselessly attacking them). They have advanced a theory that sounds sophisticated because it relies heavily on definitions of fundamental ideas from complexity theory, used without attribution. Their theory and method amount to counting the exact repetitions in a piece of data, which is the basis of all compression algorithms. As we have shown before, and as others have argued, simple things like a pile of coal or certain minerals (now proven by a recent Royal Society paper, see below) will satisfy Assembly Theory’s definition of life. This plot comes directly from applying different measures to the data the mass spectral data they made available.

The misunderstanding on the part of the authors of Assembly Theory (AT) of concepts such as Shannon Entropy and Kolmogorov complexity stems from and can be summarised in this figure with which they opened one of their main papers published in the journal Entropy and is the basis of all their arguments in favour of AT as a novel measure:

In the paper, the authors of Assembly Theory (AT) open their arguments with this figure to explain how Assembly Theory (AT) differs from Shannon Entropy and Kolmogorov complexity. However, this figure shows exactly the opposite and summarises our position and one of our main criticisms. On the one hand, when it comes to Shannon Entropy, the authors applied it incorrectly. They did not take the Entropy rate of the object which would have found the pair of blocks that minimise the Entropy that provides the same information that their example from Assembly Theory offers (and even a threshold for ‘life’ in the number of bits equivalent to the one they have proposed elsewhere based on AT, see previous figure, above). On the other hand, Kolmogorov complexity also provides the same information (+2) and fully codes their assembly index algorithm in a single line of code. However, the paper suggests that this template piece of code (the same for all Assembly Theory applications as it only counts copies, thus never changing) that fully captures all the content of Assembly Theory is ‘not computable’, but it is fully computable when implementing AT. In reality, these three cases represent and are exactly the same, they are all computable, unambiguous and provide exactly the same information. We show this formally in our most recent paper.

AT is, therefore, equivalent to statistical compression and Shannon Entropy and, mathematically speaking, a strictly weak approximation of Kolmogorov complexity (subsumed by and a proper subset of it), which the authors have always refused to accept and disclose, instead introducing their measure as completely independent of and even a ‘generalisation’ of Entropy and Kolmogorov complexity, which they claim are measures unable to deal with their objects (according to their figure above, for example) or reproduce their results. This, however, is false. Let’s take the example of their figure above. According to them, a string with a repetitive pattern like 010101010101 would be indistinguishable from a (pseudo-)random string like 011101001011.

According to the authors, because both strings have the same number of 1s and 0s, they must have the same Shannon Entropy. However, this is only true if one takes the weakest and, indeed, the wrong version of Shannon Entropy. Shannon Entropy can be measured on any unit; if one takes bits as single units, their claim is true, but it is not true for the general case of Shannon Entropy. Shannon himself, in the definition with which he began his original paper, specified that the units could vary in length. This is what happens when one applies the correct version of Shannon Entropy to these two bitstrings per block:

The Shannon Entropy of the patterned string is upper bounded by the Shannon Entropy of the random one. Only when taking individual bits as single units is what the authors of AT say true, but as soon as Shannon Entropy traverses the sequence just as Assembly Theory does, and the number of blocks as units are gradually increased (also sometimes called n-grams) from 1 to the length of the string, their claims collapse. Notice that when taking single units as the bits that compose the strings, it is indeed the case that the two strings display the same Shannon Entropy when n=1 (X axis), but just looking for larger blocks (here up to blocks of length 6, when the string is partitioned into two blocks), one can immediately tell the strings apart and even find the shortest and longest repetitive identical copies. Notice how the patterned string spikes downwards at blocks of lengths 2, 4 and 6 where the repetitive blocks are found in patterned string, with blocks of 2 units maximising the number of identical copies and finding the assembly index by counting the resulting blocks. Taking min, max, and mean of Shannon Entropy applied to both strings, one can distinguish one from the other, making their claim that Shannon Entropy cannot distinguish these strings or find the blocks that their assembly index finds false.

As proven empirically and theoretically, everything that AT does is reproducible with Shannon Entropy or lossless compression algorithms like LZ (as shown in the first figure above). The same goes for the alleged advantage — the life test cutoff value — that the authors of AT claim it affords, a claim that the new paper on the Royal Society Interface has debunked. That these approaches produce the same results is not a surprise because the process in the middle column for AT represents exactly how compression works, based on Shannon Entropy, and has been used for decades to approximate Kolmogorov complexity.

This means that, effectively, Assembly Theory can be reduced to a single line of code despite its authors’ half a dozen papers and extraordinary claims. Here is a simplified but fully convergent version written in Mathematica/Wolfram Language code that feels embarrassing to write down because it is painfully trivial.

Assembly Theory in a single line of code:

Table[Entropy[Partition[StringSplit[s, “”], n]], {n, 1, StringLength[s]}]

From here, one just takes the value that minimizes Entropy and maximizes the number of blocks, and you will get what the authors call the assembly index. It is false, therefore, that Shannon Entropy cannot distinguish such a patterned string from the disordered one and that Shannon Entropy cannot provide the same information as Assembly Theory without resorting to designing a new whole ‘theory’. Not only that, but the program on the right of their figure above, claiming Kolmogorov complexity to be uncomputable, is, paradoxically, the computable program that fully captures their assembly index and that they claim to be completely different from. In reality, the authors found three equivalent representations of the same algorithm.

Of course, different objects will need different representations, but s in the code above does not need to be a binary string, as indeed it is not in thousands of applications of Shannon Entropy applied to images, tensors, audio, video, chemical networks, molecules, or anything else, deterministic or not. Comparing their measure to a special case of Shannon Entropy for unit size n=1 is equivalent to comparing a ‘new’ machine learning method against a neural network defined in the 1960s and then claiming it is better than Deep Learning today, with the difference that the above application of Shannon Entropy for n different to 1, was introduced by Shannon Entropy himself in 1948 and was given right at the beginning of his paper basically as first definition. It is, therefore, false that Shannon Entropy cannot find repetitive blocks. In fact, this is exactly what it was designed to do.

It is true, however, that n is of variable length in Assembly Theory, just as it is in compression algorithms like LZ, which also converge to Shannon Entropy and implement a dictionary-based, variable-length block, hence being identical to assembly. The innovation of LZ algorithms lay precisely in the inclusion of these features in the 1970s that helped the classical definition of Shannon Entropy converge faster for shorter objects.

What Shannon Entropy is unable to deal with are strings of the type like 011010011001… (the digits of the so-called Thue-Morse sequence) that may arise from a causal generating mechanism but may look random. But neither is Assembly Theory. To deal with these more meaningful causal cases, what we introduced in the early 2010s is what we called the Block Decomposition Method that combines Shannon Entropy (its fully functional version, not a strawman version with no legs) and local approximations of algorithmic complexity (that may deal with cases such as the Thue-Morse sequence), hence going beyond Shannon Entropy, unlike Assembly Theory and its indexes, while also demonstrating that we could separate organic from inorganic molecules and making meaningful connections to evolution and selection rather than mediatic connection ones.

The fact that the above figure on the Assembly Theory paper made it all the way through the review process used to advance such erroneous arguments means that it was never seen by someone who could have been more knowledgeable in the field. It is an example of how some authors can break the system of checks and balances only by having the resources to do so.

How these papers make them through the review process is a mystery. However, the current working hypothesis (based on his description of how some editors tell him he wastes their time) is that with their almost 100-people research group, Cronin simply keeps trying by inundating journals with their group output fuelled by public tax money until they find the set of weakest possible reviewers who will let their papers go unchecked contributing to breaking science and fooling colleagues and the public.

Instead of finding all these contradictions unbearable and redressing them or disclosing them as a limitation of AT, they triple down, and the system keeps letting them go as if nothing had happened.

I recently found a picture I took in Northern Ireland in 2008 of the Giant’s Causeway, a formation of basaltic rocks that exemplifies why and how Assembly Theory fails at characterising life based on their simplistic idea of counting copies of the same object in a system. The basaltic rock system would have very large assembly index because it would have as many columns as required by any assembly tradeoff. According to them, on Earth the evidence is that above 15 assembly steps, objects are likely alive, and the greater the number of copies, the more alive. If they come up with a larger number on Earth or beyond, just add more basaltic column numbers:

Basaltic rocks, an example of a highly complex living organism on the shores of Northern Ireland according to Assembly Theory and their assembly index. Picture taken by Dr. H Zenil in December 2008, Giant’s Causeway, Northern Ireland.

UPDATE (9 July 2024)

The authors of Assembly Theory have posted a manuscript in response to our papers that show that their assembly index is redundant and a compression algorithm (of the LZ family) that converges to Shannon Entropy rate. Their strategy, in effect, is to claim that their index is best because it is worse (this is what we meant when we characterised it as ‘suboptimal’, not that we expected it to be an optimal compression algorithm, a position that they have erroneously attributed to us and proceeded to attack). So, while they do implement a compression algorithm, they claim it is not a compression algorithm because they are not good at compressing.

They have failed to:

  1. Address the demonstration that their algorithm is a compression algorithm of the LZ family. Instead, they have focused on proving that it is (slightly) different from a peculiar implementation of LZW for a short fixed-length string that they have permuted several times.
  2. Address the mathematical proof that we provided showing that their assembly index is an LZ grammar that converges to the Shannon entropy rate. Instead, they have presented a single fixed-size short string as proof that their index is slightly different from such a compression algorithm at the string length in question.
  3. Say anything of consequence about how we reproduced all their results in our paper (now accepted by a Nature subjournal) using very simple statistical measures, hence debunking their claim that they had evidence supporting their suggestions that Assembly Theory explained selection or evolution better than any other coding scheme. One of the authors merely commented that ‘it was interesting’ (see link to their blog post below).
  4. Acknowledge that their main result (separation of organic from non-organic matter), on which everything else is based, since they claim it as empirical evidence, was reported by our group five years before they did in the journal Parallel Processing Letters (a preprint of the article is available here). We used a 100x larger set of chemicals than the cherry-picked set they used. We did this with — no surprise — compression algorithms. This also means they have zero evidence that their compression scheme is better than any other, that it is closer to ground truth, or that they can explain selection or evolution.
  5. 1. Acknowledge that their ideas on selection and evolution were poorly developed compared to the careful connections we established between compression, information theory, selection and evolution — also five years before they did — in our Royal Society paper on the same grounds and with the same motivation but without any unfounded claims that we could explain selection, evolution or life, as they have claimed multiple times in their Press Releases and public appearances.

And much more.

However, we celebrate the fact that the authors have come to some sense and have backed off from previous claims made (by their senior author in his interview with Lex Fridman cited below) claiming that Assembly Theory would be proven to be a generalisation of Algorithmic Information Theory. Indeed, they have now confined themselves to a small conceptual space with a niche example of how to — less efficiently — compress a single fixed-length string by counting the identical copies that it contains (which is the definition of Shannon Entropy rate…)

We will be replying in a new paper showing how carelessly their empirical experiment was conducted and how mistaken their conclusions are.

In the meantime, the good news is that our last two papers critical of Assembly Theory have been accepted by our first choice journals. This despite the attempts of Cronin et al. to hijack the review process of the Nature subjournal. Indeed, the Nature editor, where our work was under consideration, caught them red-handed and decided to remove them as reviewers after they failed to disclose their competing interests (as is required by Nature and is standard academic practice), and moreover, against all rules, made their review public on arXiv while our paper was still under consideration. The paper accompanied by all the code for reproduction was accepted with almost no modifications, and we are grateful to the other 4 honest reviewers who fully supported its publication. We will provide a pointer to the paper as soon as it is made available.

The second paper, that proves Assembly Theory and the assembly index to be an LZ grammar implementing one of the simplest traditional statistical compression algorithms and converging to the Shannon entropy rate, has received good reviews, with only minor changes requested.

Instead of addressing the fundamental issues, the authors of Assembly Theory keep publishing papers which reinvent the wheel using a measure that implements Shannon Entropy, and profess surprise when confronted with results that have been reported before or can be reproduced by simple statistical means, continuing to insist that their method is different from anything ever attempted before.

UPDATE (26 March 2024)

Lee Cronin has been caught engaging in academic misconduct, again: he has been found not to have disclosed a conflict of interest in contravention of the guidelines of a Nature journal. He attempted to block our first paper critiquing his Assembly Theory, significantly delaying its publication. Among five reviewers, his review was the only negative one. He was caught when the journal’s editor dug further into the review’s authorship and when the review was found posted publicly under Cronin’s name just days later (following a recent withdrawal by Cronin), in the middle of the review process, before the journal’s decision was made, and against the journal’s rules.

According to the journal’s guidelines, a reviewer must disclose any competing interests. Yet Cronin not only did not, on account of which the journal decided to drop him as a reviewer, but he also released his ‘review’ during the review process, hence committing not one but two clear breaches of the journal’s guidelines as well as of academic codes of conduct. Cronin also thanked his team members by name, as well as those from Walker’s group, meaning he has involved collaborators and students in academic dishonesty and misconduct.

Note the definition of academic dishonesty and misconduct is any action which gains, attempts to gain, or assists others in gaining or attempting to gain unfair academic advantage.

This did not come as a surprise to us, given that when we posted our paper on the preprint server ArXiv a year ago or so, Cronin threatened to contact journals to block us (emails available).

Needless to say, the review (and its version posted online) has been rebutted in its entirety, including its misleading claims that we should have included all the 131 molecules they used (we did) or that they could not reproduce our results (we made the code available and they could have compared their results with compression in a couple of hours, but they refuse to do so).

UPDATE (10 March 2024)

Two new papers critical of Assembly Theory have been published in the last month, saliently citing and using arguments offered in this blog post and our papers:

1. “Molecular assembly indices of mineral heteropolyanions: some abiotic molecules are as complex as large biomolecules”, published in the Royal Society journal Interface, shows that the alleged trade-off proposed as a measure for life in Assembly Theory (AT) does not behave as its authors intended (see below for more details ), as we anticipated in this blog post years ago when we pointed out that the assembly index would be unable to distinguish mineral crystal-like molecules and, therefore, would not be able to characterise life or merit being considered a serious contender in the search for a measure to find or detect life in the universe, despite the claims made on its behalf to multiple media outlets by the authors of AT, who exaggerated their results, as confirmed by the author of the second paper,

2. “Assembly Theory: What It Does and What It Does Not Do”, published in the Journal of Molecular Evolution (Springer Nature), citing and agreeing with all our arguments regarding the hype surrounding Assembly Theory and the embellishment of its results. It reads: “A blog post by Hector Zenil (https://hectorzenil.medium.com/test-8f0be54817c4) identifies no less than eight fallacies of assembly theory. Scroll to the section “Original Post” for the actual beginning of the article. A video essay by the same author (https://www.youtube.com/watch?v=078EXZeS8Y0) summarizes these fallacies, and highlights conceptual/methodological limitations, and the pervasive failure by the proponents of assembly theory to acknowledge relevant previous work in the field of complexity science.”

The author writes: “It [Assembly Theory] certainly does not provide any new explanation of biological evolution or natural selection, or a new grounding of biology in physics. In this regard, the presentation of the paper is starkly distorted by hype”. (my parenthesis).

The paper continues: “… these metrics are not particularly novel. They are special cases of classical measures of algorithmic or computational complexity (especially Huffman 1952; Bennett 1988)

It adds: “This highlights well-known weaknesses and failures in the editorial and peer-review processes of high-profile scientific journals. Exposure in high-profile journals still counts for far too much social capital in today’s scientific career and funding market, and selling assembly theory for what it really is was evidently not sexy enough to get that kind of exposure.”

It concludes: “All this reflects rather unfavorably both on the authors and the scientific publication system in general. Unfortunately, failures like this one abound in our field and beyond…”

All this is aligned with and takes into account the arguments made in our papers and this blog post. The author thinks there is a minor potential positive aspect to Assembly Theory related to quantifying ‘constraint bias’. However, he was probably unaware of our previous work or of the many papers in the area of simplicity bias based on the concept of Algorithmic Probability (and using LZ) that draw heavily upon our work.

These new papers together lead to the conclusion that:

1. We were the first to publish and report — in 2018 — that organic compounds could be separated from non-organic compounds using various indexes of complexity, including compression (of which the assembly index is one and indistinguishable from LZW, proposed for this same purpose in 2021), both from nomenclature and also from ‘physical’ data, directly from molecular distance matrices. This was done following good scientific practice (with comparisons and on a 100X larger dataset rather than the cherry-picked hundred compounds used by AT), and properly reported (without the hyperbole); and

2. We made (meaningful) connections between complexity, evolution and selection using the more robust concept of ‘block decomposition’ in a 2018 paper published by a journal of the Royal Society, years before Assembly Theory (but with proper control experiments and actual (clinical) empirical validation using genetic pathways from the literature), which the authors were aware of. The authors of AT recently falsely claimed that this precedence argument is not true because, according to them, they published a paper on pathway complexity in 2017. However, in such paper they never mentioned anything about selection or evolution, and there is no experimental data either, just like in their more recent 2023 Nature paper containing only artificial sketches for illustration purposes. So, the argument does not hold.

Shannon Entropy and traditional statistical compression with extra steps

Not only does our first critical paper (under consideration by a journal) show that traditional statistical indexes can separate the mass spectral data that the authors of Assembly Theory thought was of a very special nature and hence particularly useful in separating organic from non-organic entities, but in a new paper submitted to a journal and posted on the arXiv preprint today, we prove that Assembly Theory, its pathway complexity, assembly index, and assembly number are, mathematically and methodologically speaking, strictly a weak version of algorithmic complexity and equivalent to LZW compression:

This paper now submitted to a journal and posted on the preprint server arXiv for rapid communication, puts an end to the authors’ claims that their theory and algorithm are not related to algorithmic complexity or have nothing to do with statistical compression. Moreover, it shows that they cannot explain selection or evolution beyond the connections already made and that we helped establish ourselves.

In this new paper, we show the full equivalency of LZW and their assembly components (pathway complexity, number, and index), and point out the circular reasoning and inconsistencies in their claims about selection and evolution.

According to the authors of AT, the assembly index would assign a very high assembly value to a molecule composed of all the elements of the periodic table without repetitions because of their low copy number. This is how LZ works and is what it would retrieve as a computable statistical approximation to algorithmic Kolmogorov complexity. The ‘counter-examples’ that the authors of AT have provided, in fact, confirm the assembly index’s equivalency with LZ compression.

The authors insist that they are not equivalent to Shannon Entropy, providing examples such as 01010101 and 11001001 as having the same Shannon Entropy because there is the same number of 0s than 1s on each of those two bitstrings. However, this is not the correct way to apply Shannon Entropy. Here, we provide two screenshots of Shannon’s original 1948 paper, followed by an example applied to letter strings of exactly the type that Cronin et al. provide as examples of how AT is different from Shannon Entropy (when actually showing the opposite). Picking Shannon’s least mathematical formulation hoping to help understanding, and in Shannon’s own words (example D end of page 6):

The authors of AT believe Shannon Entropy only applies to independent bits or independent letters/objects. Here it is explained by Shannon himself in his landmark paper introducing Shannon Entropy.

Shannon proceeds to introduce n-order approximations from n-gram probabilities:

Page 7 on Shannon’s original paper. The authors are submitting papers to several journals reinventing Shannon Entropy and failing at basic things like performing basic comparisons, telling how exactly they improve upon it, or citing basic references or people that have done what they introduce as new.

Notice, again, that this is a landmark paper from 1948 that it is evident that the authors of AT are not familiar with.

The authors of AT decided to use the word ‘assembly’ in their first 2017 paper for what we called “de/composition” in the early 2010s. The difference is that BDM looks for patches of causality in addition to statistical repetitions (identical copies), as Shannon already did, trying to improve over Shannon. This means that Assembly Theory is equivalent to other copy-counting algorithms like LZW, which are all forced to converge to Shannon Entropy (when n goes to infinity on a variable n window, the same n of Shannon’s n-grams and n-orders). In contrast, BDM considers how an object may have been composed/assembled by looking at the set of possible underlying generating rules that may have generated the object and its components in the first place. This led to Algorithmic Information Dynamics, a field we introduced that incorporates the latest knowledge on causality theory and combines it with information and algorithmic information theories.

The authors of AT have now written half a dozen papers on Assembly Theory and threaten to continue doing so, reinventing the field of Shannon Entropy and compression algorithms used to classify objects (including chemical molecules as we did but better). Their main argument is that Shannon Entropy is not Shannon Entropy rate, which means that they take n=1 fixed for Shannon Entropy, not allowing n to take any values, in other words, a restricted version of Shannon Entropy when they compare it (theoretically) to Assembly Theory attempting to distance themselves to it. They also claim their n is variable, which is also the novelty of dictionary-based algorithms introduced in the 60s and 70s, with LZ78 even requiring only traversing an object once. However, that advantage is only to converge faster because they still converge to Shannon Entropy (rate). No expert in information theory or algorithmic complexity would ever dare to claim that Shannon Entropy is restricted to objects of block size n=1, setting it for failure, but the authors of Assembly Theory do.

Disassembling Assembly Theory

This new paper published in the journal Interface of the Royal Society by a group of chemists and planetary scientists, including an author affiliated with NASA, has just proven exactly what we anticipated:

Hazen Robert M., Burns Peter C., Cleaves H. James, Downs Robert T., Krivovichev Sergey V. and Wong Michael L. 2024Molecular assembly indices of mineral heteropolyanions: some abiotic molecules are as complex as large biomolecules, J. R. Soc. Interface.212023063220230632

The following paragraph in the Introduction to the paper saliently cites this blog and several of our papers:

“AT has not gone unchallenged. Some reviewers question whether a scalar assembly index can be employed to adequately discriminate between living and nonliving systems [9,10]. Other critics note similarities of the AT approach to uncited prior efforts to distinguish biotic from abiotic molecular suites by statistical or algorithmic measures [11]. In particular, Hernández-Orozco and colleagues [12] may have anticipated key conclusions of assembly theory by exploring connections among causal memory, selection, and evolution. This hypothesis has also received criticism based on a variety of concerns, including ambiguities in the numbers of molecular copies that constitute ‘high abundance’, the optimal algorithm to calculate ‘pathway complexity’, the disconnect between the proposed theoretical assembly pathways and actual chemical processes, and the absence of kinetic and thermodynamic factors in assessing the probability for a complex molecule’s formation [9,10,13].”

References 9 to 12 are to this blog and our papers.

They conclude:

“…We have demonstrated that abiotic chemical processes have the potential to form crystal structures of great complexity — values exceeding the proposed abiotic/biotic divide of MA index = 15”

“…we conclude that significant structural complexity of molecules is not the unique province of biochemistry and that natural inorganic chemistry has the potential to generate significant local populations of molecular structures with MA indices greater than 15. In that case, the possibility should be entertained that abiotic processes might also produce organic crystals of great complexity on a carbon-rich, abiotic planet or moon, especially given billions of years of abiotic organic mineral evolution in the absence of life.”

“In conclusion, while the proposal of a biosignature based on a molecular assembly index of 15 is an intriguing and testable concept, the contention that only life can generate molecular structures with MA index ≥ 15 is in error. Furthermore, in spite of amusing speculations in the literature of science fiction [49], it is unlikely that the definition of any universal phenomenon as complex and diverse as ‘life’ can be reduced to a scalar.”

Exactly as we predicted in our first blog post, which we published in 2022 (and which is included at the very bottom of this post)!

The authors of the paper reported that the response of the authors of Assembly Theory to their findings when contacted was that for the assembly index cut-off to work, one had to manually filter out the offending molecules and discard them a priori. In other words, it is necessary to identify and filter out the non-organic compounds in order to feed the assembly index with only the organic ones. To which the authors of the paper, debunking the claims that AT can detect and classify life, responded that “…. such an exclusion would seem to violate the very premise of assembly theory’s claims regarding unambiguous biosignatures based on molecular complexity.” In doing so, they point out yet another contradiction in Assembly Theory. So much for a measure that, according to the authors, ‘does not require any computation or mathematics’ (whatever that means) to work. In fact, it even requires a significant pre-processing step that was supposed to be done on its own and was its original greatest value.

Lee Cronin’s new ad-hominem smear campaign on social media:

In a new video interview with a YouTuber, Lee Cronin has made new false claims and embraced a new tactic, casting doubt on my credentials and affiliations. Here is some fact-checking:

1. In the YouTube video, Lee Cronin and his interviewer claim I engaged with a creationist. Let’s recall why I accepted the invitation of Prof. James Tour to appear on his podcast, Prof. Tour being the one the YouTube video host and Lee Cronin call a creationist. Dr. Tour is a professor at Rice University in Houston, one of the best universities in the world. Cronin speaks highly of Prof. Tour as an accomplished chemist in this interview. Prof. Tour has also been awarded multiple prizes, including one by the Royal Society, the most reputable organisation in the field in which both Tour and Cronin work — and the same society that suspended Cronin for misconduct. Furthermore,

a. I met Tour because of Lee Cronin’s interactions with James Tour going back years. Cronin has accepted invitations to Tour’s events. Lee Cronin has appeared on podcasts featuring Prof. Tour and Cronin recently accepted an invitation to Harvard University from Prof. Tour.

b. At one such event, both Tour and Cronin referred to me by name, and it was the first time I had heard of him. My intervention was, therefore, the result of Lee Cronin’s own engagement with James Tour. I was literally dragged into their discussion.

On Prof. Tour’s podcast, I opened and closed my interventions by pointing out that I was not a creationist and did not agree with the use of creationist ideas in Origin of Life arguments, also condemning intelligent design in science on my last slide, where I also condemned scientific deceivers.

I am sympathetic to the fight against creationism and intelligent design in science, but this problem is peculiar to the US. Nowhere else is creationism a challenge or a serious argument in science or education. Furthermore, no one else should have to care whether or not someone approves of the decision of an academic to talk to another academic, in this case, Prof. Tour, who I haven’t ever seen using religious tenets to make a scientific argument, though some claim he attacks the field of Origin of Life (OoL) because of his convictions. This third-grade argument has nothing to do with science or scientific practice. Under this logic, institutions such as Rice University, The Royal Society of Chemistry, and Harvard University support creationism and have lost their credibility for hiring, inviting, and engaging with Prof. Tour.

Some people do cancer research because they have a relative with cancer; perhaps indeed, Prof. Tour has chosen OoL because of his convictions, but we all have a source of motivation to enter a field, and that is not a problem. Every one of Tour’s criticisms of Cronin’s work is factually correct. Prof. Tour’s online religious content, which I was made aware of in the process, makes me uncomfortable, but this is not a reason not to speak to someone about science, on the contrary. It is only Prof. Tour’s business what he chooses to believe and say online, as long as his scientific arguments are sound and befitting his status as an academic and as long as he is not personally disrespectful to others in my presence for which I would have refused to participate (unlike Cronin’s engagement with mocking YouTubers online). This does not mean that I endorse all of Prof. Tour’s activities, beliefs, or claims about other people’s research. I joined Prof. Tour’s podcast specifically to confirm his scientific thoughts about Assembly Theory.

With creationism, people know they are taking a leap of faith, but with some researchers, who are trusted by virtue of holding highly regarded academic positions, people may not know they are being deceived, which leads to science losing its credibility and legitimacy, which is more dangerous than creationism itself. Science is in deep crisis, and people who exaggerate the significance of their work in this way for personal gain do a profound disservice to science (see a recent report in The Guardian on how science is in trouble).

2. Lee Cronin did not address our actionable criticisms, despite claiming he was waiting for them, namely that his Assembly Index results can be replicated by every other possible trivial algorithm tested on all possible chemical input data taken into consideration.

a. Cronin insists he is not using a compression algorithm. I am not obsessed with calling the assembly index a compression algorithm. The main problem is that the results their papers show are the result of compressing data, which in turn depends on the principles of algorithmic (Kolmogorov) complexity, which we reported years before their group. The efforts of the authors of Assembly Theory to distance themselves from Shannon Entropy (of which their assembly index is an equivalent, just like LZ), compression algorithms, and algorithmic (Kolmogorov) complexity are beyond comprehension and are so wrong that first-year students in computer science would have rejected most of their papers published to date, proving that peer-reviewing is in serious crisis.

b. In our paper published in 2018 (for open access to our paper, you can also use this link to the preprint), we also showed how to separate biological from non-biological molecules based on various data features, including ‘physical’ data such as molecular distance matrices. After their use of mass molecular spectra data, we also obtained exactly the same results using their own data with decades-old algorithms. In other words, no special claims can be made about the data input or the assembly index other than the fact that the latter is a compression algorithm that (re)produces the results reported before, where we followed proper scientific methodology (comparison to other algorithms) and good scientific practice (no hyperbole). However, when cornered, the authors of Assembly Theory resort to claims related to the data’s ‘special nature’, an incremental contribution, if anything, and a far cry from their grandiose claims (see below). Incremental because even if it were true that the data is of a very special nature (which it is not), their results can be reproduced using any other index and data input, as we have explored, thus hardly adding anything new to the discussion, let alone justifying the public claims they have made.

c. The authors of AT keep spreading misinformation about algorithmic complexity and its ‘requirement to use Turing machines’ to make AT look different from it, even going so far as to say that algorithmic complexity will be proven to be part of Assembly Theory in his interview with Lex Fridman (here is my reply cited in the paper published on the journal of Molecular Evolution by Springer Nature). In our new paper, we prove that Assembly Theory is, mathematically speaking, a strictly weak approximation to algorithmic (Kolmogorov) complexity, of which there are others with which it is fully equivalent too, including LZ algorithms, of which the assembly index is one, as we have proven.

Neither the assembly index nor the LZ compression algorithms explicitly invoke or require any Turing machines. Turing machines are just algorithms, and algorithms are Turing machines. By virtue of the assembly index and Assembly Theory being computable, they are also algorithms and, therefore, ‘Turing machines’.

d. It is false that the data of our preprint paper showing how we reproduced every result from Assembly Theory with traditional statistical and compression algorithms is not available, but even if it was not, the experiment could be conducted in a couple of hours. The source code is here and was made available to the journal reviewers. The authors refuse to replicate the experiments themselves (an evening’s project for which they do not need any code from us, if it were unavailable) and offer an explanation. In turn, the data in their 2017 paper is incomplete and reproducible (we took their results at face value to compare to ours which are fully reproducible). Nowehere else the authors of AT have offered any experimental data.

3. Lee Cronin’s practice of science is to regularly post on X (previously Twitter) that make controversial comments about science and false statements about others’ work in order to trigger them. He has built an online cult by doing this in a bubble he has created. Lee Cronin seems to believe that the more views he gets online, including views of the YouTube videos hosted by amateur educators and their followers, the more legitimate he and his views are. This shows a disconnect from academic reality.

a. Lee Cronin offered the idea that I do not get invited to podcasts hosted by non-academics as an argument that I am less scientifically valuable than him. This speaks strongly to what Cronin takes to be a measure of success in academia. Cronin is very proud of his interviews with popular influencers, just as he is of the many tweets he gets. Cronin has become the academic version of social radicalization in science, he promotes division and controversy as a tool for his personal gain.

It seems to me that the reason Lee Cronin gets invited to online podcasts is because he runs ‘science’ as a marketing campaign and Cronin speaks with people who declare themselves ignorant of the topics he speaks about, always leaving unchallenged and appearing as if he was not academically disputed on every claim he does. So, it is a win-win for both sides: podcast producers and Cronin’s fake building of credibility. Lee Cronin once claimed to the media and at conferences that he was only one to two years away from creating life in the lab (he made these claims in multiple media outlets). I believe Cronin’s next publicity stunt is likely to declare that his lab has created artificial life, applying this ad hoc flawed measure to the cell experiments that he calls ‘salad dressing’ because they consist of dropping oil on water, with the oil droplets moving around as if they were alive because of their hydrophobic properties. This will be the mother of all deceits planned meticulously for at least a decade.

b. Assembly compression theory and compression algorithm show no evidence of being better than any other explanation based on compression and, therefore, Shannon Entropy. Its compression algorithm has exactly the same classificatory and prediction power as others available: they are all dictionary-based and define a dictionary tree (renamed a ‘pathway’ by the AT authors), and are sequential like most other popular compression algorithms, including LZ78, which was designed to be optimal at traversing objects only once while still converging to Shannon Entropy by counting identical copies (just as the assembly index does). Compression algorithms are not ‘obscure’ schemes with ‘instantaneous’ access to memory, as Cronin describes them, showing his lack of knowledge of a field fundamental to his own subject. But even if they were not sequential, that may well be the way nature assembled some or most objects.

Conflating causality and assembly object

The authors of AT confound causal directionality with the way an object may have been assembled, as we have shown in our new paper. In fact, we already have evidence and know with certainty that nature is not sequential (for example, genomes do not get updated from one end only), as the authors claim an algorithm has to be in order to explain the causal origin of objects. Most of their causal empirical arguments are based on separating organic from inorganic compounds (that we reported years before AT) based on complexity (including copy number). However, as we have shown, other algorithms are equally good or even better at doing so and, under their own arguments, this would mean that those algorithms correspond better to the causal physical processes behind the object by virtue of better predicting the properties of the object (that is, to be organic or not), which is the test that AT used to declare their theory valuable (a circular argument they use in their favour when they use it but not so when we do).

Lee Cronin often says that the data supporting our first critical article is not available; it is. Paradoxically, the only dataset missing is in their 2017 paper, so we had to take their reported results at face value as they are not reproducible (when all the while their argument has been that we haven’t released ours!). The rest of their papers do not include any empirical data, only speculative illustrations. When they make data available, we have shown that their assembly index performed like any other compression algorithm on precisely the same data they used, obtaining the same or better results at separating organic from non-organic molecular compounds (their result on which everything else they claim is based).

Not only did we do correctly what AT did incorrectly (using a dataset with over 15,000 compounds as opposed to the 131 AT hand-picked), but we connected algorithmic complexity to evolution and selection more meaningfully, with actual experiments and even tested on a cancer pathway. We did so years before Cronin and his group suggested their trivial connection (See our papers published since 2012 in journals such as Entropy, Royal Society Open Science, Scientific Reports, iScience, Parallel Information Processing, and Nature Machine Intelligence). Yet we did not mislead anyone by exaggerating the value of our findings, findings that Cronin and Walker rediscovered and presented in an embellished fashion for self-promotion. We never claimed that our results unified physics and biology, redefined the concept of time, characterised life for the first time, or lent themselves to being used in any meaningful way to the search for extra-terrestrial life, because they do not, and because, unlike Cronin and Walker, we are responsible scientists.

4. Cronin claims that we are mixing theory and algorithm. First, both theory and algorithm are subsumed in algorithmic complexity, as proven by our latest paper, but their main feature is the idea of this over-representation of exact identical copies in living systems. The theory is a restricted, strictly weak version of the concepts of algorithmic complexity as applied to living systems before and, in our new paper, we prove that both theory and algorithm are equivalent to a basic compression algorithm (LZ). In the new paper, we explain how Assembly Theory missed developments in complexity theory that happened between the 1980s and today, and where their main ideas and methods come from:

Assembly Theory misses all developments in complexity science from the last decades. It reproduces the most basic ones developed in the 60s and 70s against which we compared their methods, showing they perform similarly or outperform AT in the only area where its authors offered empirical evidence, which was in the separation of organic from non-organic molecules, that we, nevertheless, reported years before without all the hype. The domain of application is also not new; not only have we established sound connections between complexity and selection and evolution (abiotic and Darwinian), but the field of chemical and molecular complexity based on Shannon Entropy (what LZ implements) has been around for almost two decades.

The new paper explains how Assembly Theory attempts to explain selection and evolution using a circular argument, how their theory is contained within the theory of algorithmic (Kolmogorov) complexity and that both the theory and their algorithms are exactly equivalent to LZ compression, used to approximate Kolmogorov complexity.

A: The authors of AT have suggested that algorithmic complexity would be proven to be contained in AT (Lex Fridman interview). This Venn diagram shows how Assembly Theory (AT) is connected and subsumed within algorithmic complexity or K and even within statistical compression, as our paper proves (see Sup. Inf.). B: Causal transition graph from a Turing machine with machine number 3019 (in Wolfram’s enumeration scheme (Wolfram, 2002)) with an empty initial condition found by using a computable method (e.g. CTM, (Zenil, 2011)) to explain how it is false that Turing machines cannot deal with assembling objects (of any type, including networks, spectral data, or any object that can have a representation). Here is an example showing the block-patterned string was assembled step-by-step based on the principles of algorithmic complexity, describing the state, memory, and output of the process as a fully causal mechanistic explanation. By definition, this is a mechanistic process and as physical as anything else, not an `abstract’ or `unrealisable’ process, as the authors of AT have suggested. Turing’s motivation was to explore what could be automated in the causal mechanisation of operations, for example, stepwise by hand using paper and pencil. The shortest among all the computer programs of this type is an upper bound approximation to K. In other words, K cannot be longer than the length of this diagram. However, you do not have to care about Turing machines or shortest programs to instantiate an algorithmic complexity analysis of data as many papers have done using LZW, of which Assembly Theory is a particularly weak version and not a generalisation. This Turing machine is simply an algorithm and shows how it is related to causality and even Assembly Theory.

5. Regarding whether I have called Cronin a charlatan: I didn’t in the video with Prof. Tour; I called him a ‘sophisticated deceiver.’ But I think the authors’ behaviour, the embellishment of the properties someone attributes to their product, is consistent with the definition of charlatanry. Let us recall the claims that Cronin and Walker have made publicly about Assembly Theory in their University Press Releases (which they authorised or wrote), their documented interviews online, and content they have let the media have free rein with. Here’s a sampling:

· “Assembly Theory unites physics and biology to explain the universe”

· “Life: modern physics can’t explain it — but our new theory, which says time is fundamental, might” [Wasn’t time not already fundamental in natural selection?)

· “How a radical redefinition of life could help us find aliens”

· “Bold ‘New Theory of Everything’ could unite physics and evolution”

· “A new theory of matter may explain life”

· “Assembly Theory may revolutionise drug discovery”

The senior author of AT has raised millions for a drug discovery start-up, and claims Assembly Theory may be very relevant to it in a University of Glasgow public press release given to the media. However, the Assembly Theory papers make a meaningful connection to drug discovery only in the design of chemical compounds that are removed from random (their only comparison point). How those papers make it through peer-reviewing is baffling.

However, when challenged or in the presence of more knowledgeable people, Lee Cronin switches to a humbler mode:

  • “This is only a hypothesis”
  • “We may be wrong”
  • “I know nothing about mathematics, philosophy, complexity, biology, physics…”
  • “I am an amateur on all these topics”

6. Regarding Cronin’s claims that I wanted to collaborate with him. This is partially true. I met him in Italy in 2013 in Taormina, at the Artificial Life conference, where he made his claims about creating life in his lab. He said in 2013 that he was about one or two years away from creating artificial cells out of non-organic molecules in a lab. He even sometimes said he had already created cells in his lab, implying he had created artificial life saying “And just a few months ago in my lab, we were able to take these very same molecules and make cells…” on Ted talks starting as early as 2011. After the ALife event, I told him in writing (by email) that if he wanted to claim that he was creating life, he would need to find a measure to characterise life and capture the behaviour of his artificial ‘cells’ as they approached life-like behaviour. I told him that measures of algorithmic complexity were the best equipped for this, being the accepted mathematical measures of randomness, simplicity, and structure. I think this is where the idea for Assembly Theory came from, but it is a shame that it was so poorly executed.

Sara Walker and I published a paper as co-authors before I openly called her and Cronin out for the way they have gone about misappropriating and overselling Assembly Theory. Lee Cronin invited me on a few occasions to give a talk to his group (emails available), which I never did. Sara Walker did, too, a request which I honoured on April 11, 2016, when I presented my Block Decomposition Method and its applications to causality and living systems to their group at Arizona State University in Tempe. Here is their seminar announcement/flyer as was shown on her group website:

This is a flyer circulated by Sara Walker’s group, senior co-author of Assembly Theory before Assembly Theory. The announcement and this flier are from the Internet Archive, showing how they were hosted on the web server of Arizona State University, as published by Sara Walker’s group. Please read the description of the content of my talk: causal networks, complexity, living systems, causality, Shannon entropy, dynamical systems, genetics and biology.

I participated in Sara Walker’s book From Matter to Life: Information and Causality by her invitation with an article under the title “Causality, Information and Biological Computation” (a version online is available here). The abstract reads:

“Biology has taken strong steps towards becoming a subarea of computer science aiming at reprogramming nature after the realisation that nature herself has reprogrammed organisms by harnessing the power of natural selection and the digital prescriptive nature of replicating DNA. Here, we further unpack ideas related to computability, algorithmic information theory, and software engineering, in the context of the extent to which biology can be (re)programmed, and with how we may go about doing so in a more systematic way with all the tools and concepts offered by theoretical computer science in a translation exercise from computing to molecular biology and back. These concepts provide a means to a hierarchical organization thereby blurring previously clear-cut lines between concepts like matter and life…”

and here is an interesting excerpt of ours:

“AIT is the subfield that characterises lossless compression, and has as its core Kolmogorov complexity. It is the bridge between computation and information and it deals with an objective and absolute notion of information in an individual object, such as a cell in a network representation [46]. We have proposed ways to study and quantify the information content of biological networks based on the related concept of algorithmic probability, and we have found that it is possible to characterise and profile these networks in different ways and with considerable degrees of accuracy [44,45]. This shows that the information approach may open a new pathway towards understanding key aspects of the inner workings of molecular biology causality-driven rather than correlation-driven.”

The similarities between our work and Assembly Theory (AT) are striking both in the underlying concept and specific wording if it was not because ours followed good scientific practice and moved the field forward as opposed to bombastic only. This may explain why the authors of AT insist AT has nothing to do with Kolmogorov complexity or compression or any of our work…

Adopting a new tactic, Lee Cronin has begun attempting to discredit me personally, questioning my credentials, career choices, and affiliations on the following grounds:

Having multiple (two) PhDs (one in Computer Science and another in Logic and Epistemology),

Having been formerly affiliated with the universities of Oxford and Cambridge, moving between them, and

Having published papers in what Cronin thinks are ‘obscure journals’

are somehow red flags for Lee Cronin.

a. This makes no sense to me in light of the high value Lee Cronin places on online discussions of scientific matters with people who have no Ph.Ds, no papers published in any journal, no academic credentials or affiliations, and who are self-declared amateurs on the topic.

Regarding the journals in which I have published before, I have published some of my papers in top journals in the areas of Physics (Phys Rev E, Physica A), Biology (Nucleic Acids Research, PLoS Biology, iScience (Cell), Seminars of Cell Biology, Frontiers in Oncology), Computer Science (Appl Math, Theo. Comp. Sci, Frontiers in AI, etc.) and in all the best journals in my area of Complexity Science (Complex Networks, Complexity, Bifurcation and Chaos, Complex Systems, Natural Computing, Cellular Automata, etc.). I have also published in several of the outlets where Lee Cronin has published, including Entropy (of which I am an associate editor), the Royal Society journals, and several Nature subjournals such as Nature Machine Intelligence, Scientific Reports, and so on. In fact, Nature produced a video on their own initiative (unpaid) in 2018 explaining our research on Algorithmic Information Dynamics to promote our paper in the Nature journal.

If Lee Cronin meant these papers are in obscure places as an excuse not to have cited our work, this is not only false, but by his own admission, he has known about me and our work since 2013. Way before his Assembly Theory, he and his other main AT co-author, Sara Walker, invited me to give talks at their respective groups. We have published in the same volumes on the same topics, and I even was a co-author with his closest collaborator on this topic, Sara Walker. Again, while in 2017, they published their first paper on pathway complexity (a rehash of algorithmic complexity approximated by LZW), nowhere had they made any connection to selection or evolution as we did in 2017, nor had they reported the separation of organic from non-organic compounds that we reported years before their 2021 paper (in 2017).

The way Prof. Tour introduced me was his choice. I haven’t ever pretended to be more than I am or embellished my research, and indeed, I do not have access to the marketing engine that Cronin has built, paid for by the UK taxpayer. If anything, I have been told I usually undersell my research and come up with factual, technical titles for my publications that do not inflate their significance and do not call too much attention to themselves.

b. What I believe Lee Cronin is very disappointed about is that what their paper announced as the discovery of the century, their method for separating organic from non-organic compounds, was reported by us in a paper we published in a journal years before theirs, and moreover is a journal less well known than those in areas of chemistry and biology that enjoy higher impact numbers. The said journal, however, is a journal put out by World Scientific Press/Imperial College Press, probably the second or third most reputable journal and scholarly book publisher in the world. Cronin does not know (by his own admission) about the fields of logic, computer science, or mathematics, and I can understand how some journals in those fields may appear obscure to him. These arguments, however, have nothing to do with science; many landmark papers have been published in the most obscure journals, and this again speaks to what Cronin values the most, which is not substance.

Lee Cronin also accuses me of publishing ‘too many book chapters’. These book chapters are traditionally invitations and are fully peer-reviewed, most of them published by Springer Nature or World Scientific/Imperial College Press, often of greater value than publications in high-impact journals, especially in the areas of computer science, logic, and mathematics. In fact, I cannot name a single landmark paper in these three areas that had been published in Nature or Science journals, and it only shows how ill-informed the author of AT is in this regard. Paradoxically, it has been Sara Walker, the other senior co-author of Assembly Theory together with Cronin, who has organised some of these book compilations to which I have been invited to contribute (by her) before I called out the many problems with Assembly Theory. In the same volume, which featured a contribution of mine, Cronin introduced one of his first ideas of Assembly Theory — in a book chapter.

c. In the event that anyone may be taken in by attempts to discredit me, the webpage for when I was a Senior Researcher and faculty member at the Department of Computer Science at the University of Oxford is here (Structural Biology Group), my current Oxford affiliation is here. I left the Department of Computer Science at the University of Oxford to lead and raise funds for my Oxford University spin-out, in which the University has equity. I remain associated with Oxford in my role as a CDL Mentor on AI at the Saïd Business School. Altogether, I have been associated with Oxford for about ten years. My University of Cambridge affiliation can be seen listed here. I recently left the University of Cambridge, though they wanted me to stay for many more years, to join King’s College London as an Associate Professor/Senior Lecturer in Biomedical Engineering. It is one of the most reputable universities in the world (ranked 6th in life sciences, above Glasgow and ASU by several dozen to hundreds of positions in all subjects, according to the World QS ranking). Lee Cronin’s attempt to make me look ‘mediocre’ (his own words online) and sow confusion, while I was moving between institutions, is also inconsistent with the conduct expected from a University Professor who engages in non-scientific controversy and gossip.

Of the 12 years following the completion of my first postdoctoral position at the Behavioural and Evolutionary Theory Lab, University of Sheffield, I was affiliated for about six with the Karolinska Institute in Stockholm, Sweden, as an Assistant Professor and Lab Leader. Karolinska is the highest-ranked university in Sweden, one of the world’s top ten life science institutions (QS), and the institution that awards the Nobel Prize in Medicine or Physiology every year to the best science in the field. Then I spent about six years evenly split between Cambridge and Oxford (ranked 2nd and 3rd among the world’s top universities, with the latter often 1st in Computer Science worldwide) as a faculty member and senior researcher, and was also affiliated with The Alan Turing Institute (the UK’s national institution for AI and data science).

Lee Cronin also suggests I am no longer affiliated with The Alan Turing Institute. Even if true, I don’t see how this would be relevant. He should be focused on the merits of our scientific arguments and nothing else. However, as of today, I still have an official affiliation with The Alan Turing Institute. I remain paid by and affiliated with the Alan Turing Institute in an official capacity as an external Innovate UK AI advisor, one of only ten external advisors serving the institute to help accelerate AI adoption in the UK. I have added my institutional email addresses (Oxford, Cambridge, Turing, and KCL) at the bottom of my personal webpage as of today in case Lee Cronin continues auditing me and sowing seeds of doubt even when it should not be his business other than the scientific arguments by their own merit.

Regarding the YouTube host who disrespectfully called me ‘mediocre’ in the description of his YouTube video, endorsing everything Lee Cronin said despite knowing nothing about me, I would probably care if he were in a position to evaluate my career or research. I think he took it personally that I said on James Tour’s video that science communicators and some YouTubers would be able to call out charlatans such as flat Earthers or creationists but would find it difficult to call out sophisticated academic deceivers like Lee Cronin for their science as alleged domain experts. I am surprised he took it personally, having publicly declared himself unable to evaluate Assembly Theory, which is the only subject I have talked about all along.

Since Lee Cronin is happy to engage with a YouTuber who refers to academics as ‘soiling themselves,’ as ‘dumb as rocks’ in the best case, or as ‘shit’ in the worse one, it is strange to have Cronin hold me to so much higher standards. This is a small sample of four comments by “ProfessorDaveExplains” in the Comments Section of the video interview with Cronin, mirroring how he conducts himself in his channel enabled by Lee Cronin. These are only four of his latest comments as of today, with one day difference between them, writing the f* word in about 1 out of 2 of his messages to his audience on a very regular basis:

The level of conversation of podcasts where Lee Cronin participates and gets involved.

This is the person who hosts Cronin regularly as his guest, with Cronin joining and enabling him to mock and insult academics and people online. Cronin is right that I will never engage or accept an invitation to hate/bullying shows of this level or make frivolous social media postings as Cronin does on a regular basis. Rather than feeling uncomfortable with those, Cronin promotes them and is very proud of them, as per his own words in the same video. This is, again, inconsistent with the stature of a University Professor, who is trusted to inform, be truthful, and behave respectfully.

While Lee Cronin goes above and beyond what is wrong with academia, here are some of the more general issues that Assembly Theory makes evident and is an example of what is currently wrong with academia today:

  • universities keeping academics because of how much money and publicity they bring to their institutions and not because of their science,
  • academics distorting and embellishing their results to get published and get media attention,
  • media and journals accepting or promoting unfounded bold articles to get more citations and impressions to sell more,
  • universities and PIs exploiting young researchers used as exchange currency most of them with no career progression opportunities serving PI’s egos and personal agendas,
  • funding agencies letting all this happen by continue supporting celebrity groups that produce little substance.
  • other researchers keeping quiet rightly fearing retribution.

All this is with little regard for academic honesty or scientific integrity.

While many of these problems are very complex, some areas of opportunity may present to us. I have no easy answers for solving the wrong incentives driving journals, reviewers, universities, and academics, but I do have a few recommendations to consider and a call for action:

  • Science writers should think twice about University Press Releases and their motivations; they are literally unregulated advertisements with a conflict of interest. You would do a much better service to science by focusing on researchers who precisely have no access to these marketing resources, the opposite of those on Press Releases. Finding good science and making it exciting to the large public and readers from less well-digested sources may be more challenging, but it makes things fairer and more honest.
  • I encourage academics to speak out when they see wrongdoing instead of staying silent becoming accomplices, especially those who are more senior and may have less to lose yet remain quiet.
  • I also want to call on funders such as the UKRI to regulate university PR departments and centralise marketing efforts across researcher departments to level the playing field for all researchers, especially young ones, and moderate those with excessive exposure.

d. There is one last thing the host in Cronin’s YouTube video was wrong about when claiming that no one else seemed to call out Cronin for their flawed arguments in Assembly Theory. This is not true. At least 12–14 academics and authors of papers published (some cited above) concur in the criticism of Assembly Theory, finding it wrong or at fault on the grounds that we reported in this blog post, which they cite prominently alongside our papers.

UPDATE (Thursday, 11 January 2024):

Disassembling Assembly Theory

Lee Cronin continues to claim (as of January 7, 2024) that the assembly index is not a compression algorithm even when it is precisely that. The examples they show are exact implementations of LZ77/LZ78. The core of AT demonstrably makes use of an algorithm of the LZ family, basic compression algorithms introduced in 1977/78 and implemented in popular modern file formats such as ZIP, GIF, etc.

The authors of AT are unwilling to acknowledge that what defines an algorithm is not the use to which it is put but rather its structure and function. Leaving their unjustified claims of novel findings aside, if an applied algorithm results in compression of data, it is a compression algorithm. The AT theorists are grandstanding on a cardboard stage. The main issue is that the results they are reporting are the result of that compression algorithm in action, approximating Shannon Entropy rate, known and reported before and not novel at all. The authors deny that theirs is an ordinary compression algorithm in order to exaggerate the novelty and impact of their analytic models. Why engage in such scientific sophistry?

Because what they are calling Assembly Theory is, in point of fact, a strictly weak approximation of algorithmic Kolmogorov complexity approximated by traditional compression implementing Shannon Entropy that they refuse to acknowledge any connection with because it would show that they have reinvented a field (continue reading).

Striking similarities exist between Assembly Theory, which pretends to explain selection and evolution, and our Coding Theorem and Block Decomposition Methods (introduced in the early and mid 2010s). Our method, however, includes but goes beyond identical copy detection, is deeply connected to causality as we understand it today after 25 years of research on causation, and has been tested with actual data and clinical data (a cancer pathway).

The authors’ new narrative focuses now on the claim that ‘Assembly Theory explains selection and evolution’ (biological and beyond). Unfortunately, just as with their claims of distinguishing organic from non-organic compounds, we also connected algorithmic complexity to selection and evolution in 2018 with our paper published by the Royal Society Open Science under the title “Algorithmically probable mutations reproduce aspects of evolution, such as convergence rate, genetic memory, and modularity”. However, unlike them, we did it correctly, with control experiments, and a measure that was not equivalent to one of the simplest compression algorithms, ultimately Shannon Entropy convergent.

The similarities of their definition of evolution in terms of Assembly Theory are strikingly close to (but mathematically strictly weaker than) ours. Their paper in Nature makes grandiose claims for which there is neither experimental nor empirical evidence.

Both of these claims, which the authors of AT present as if they comprise novel innovations, were previously reported in our own research from 2018. Contrary to the activities of the authors of AT, our research is carefully formalized, our arguments and interpretations are clearer, and we did experiments and presented supporting data in keeping with the methods of modern science. Conversely, they have appropriated and deployed theory and algorithms from the 1960s and 70s, and when faced with our published research, they deliberately declined to cite it.

Instead, they claimed that their theory was unrelated to the concept of Kolmogorov complexity.

This is patently untrue.

Their claims of novel methods are based upon two methods previously introduced by our groups:

1. An application to separate organic from non-organic chemical compounds, and

2. An application to explain selection and evolution as emergent phenomena, including specific forms of modularity (the ‘copies’ in Assembly Theory)

As shown in the following papers:

H. Zenil, N.A. Kiani, M-M. Shang, J. Tegnér, Algorithmic Complexity and Reprogrammability of Chemical Structure Networks, Parallel Processing Letters, vol. 28, 2018. (a complete version is also available on Arxiv). We here demonstrate how organic compounds can be separated from non-organic compounds using LZW and other algorithms with high accuracy. We show this on an exhaustive database of over 15,000 compounds. Compare this to the five compounds that the authors of Assembly Theory say were provided by NASA (or a hundred compounds considered by their ‘calibration’ procedure). Our described method not only detects and counts identical copies but also explores small causal structures that, in sequence, explain how a process may have unfolded or an object may have been assembled.

In their Nature Communications paper, the authors of AT posed their results as new and revolutionary, claiming in a broad array of public media to have found a theory of everything and to have unified biology and physics. We did not mount a similar media campaign because we found that these analyses were neither a theory of everything nor a bridge between physics and biology that the authors of AT irresponsibly claim. We did find, however, very interesting that classification was possible with any representation of the chemical data, including mass spectral data, as well as, InChi and molecular distance matrices as shown in our 2018 paper.

(note: our complexity calculator with which you can replicate the experiments can now be found at https://complexity-calculator.com/ as we lost the domain without the hyphen to a squatter)

In our next paper:

S. Hernández-Orozco, N.A. Kiani, H. Zenil, Algorithmically Probable Mutations Reproduce Aspects of Evolution, such as Convergence Rate, Genetic Memory, and Modularity, Royal Society Open Science, 5:180399, 2018.

we investigated the application of algorithmic complexity to explain and quantify selection and evolution as emergent phenomena, including the modularity (that Assembly Theory calls identical copies), with proper control experiments such as comparing evolutionary convergence rates and several indexes and methods against each other. Results on both synthetic and biological examples indicate that our theory could explain an accelerated rate of mutations that are not statistically uniform but are algorithmically uniform which may demonstrate how evolution finds shortcuts through selection as an emergent property of algorithmic information theory.

In the paper, we show that algorithmic distributions can evolve modularity and genetic memory (by the preservation of structures when they first occur) from very basic computational processes, leading to both the accelerated production of diversity and population extinctions. This approach is promising for explaining naturally occurring phenomena such as diversity explosions (e.g. the Cambrian) and massive extinctions (e.g. the End Triassic) whose causes are currently the subject of broad debate. The natural approach introduced in this paper appears to be a better approximation of biological evolution than models based exclusively upon random uniform mutations, and it also approaches a formal version of open-ended evolution predating biological evolution as an example of how selection emerges. These results validate some speculations that computation (as compared to mutation) may be an equally important driver of evolution.

We validated our approach on a cancer pathway, finding the most likely oncogenes. The measure we introduced is called Block Decomposition Method (notice the similarities, again), it counts blocks of causal content; not merely identical copies as AT trivially does but, additionally, patches of simple transformations that may provide explanatory information relating to the history of a process and the sequences of steps that produce it. The similarities between our work and Assembly Theory stop there. Our method is not simply the application of LZW compression; it can do what a compression algorithm does (and we compare it against compression) but also locate the small causal structures that sequentially assemble the original object.

In our next paper, published in iScience (a journal of Cell) in 2019, we were able to reconstruct an epigenetic Waddington landscape validated with the three genetic databases in accordance with literature in cell biology on how stem cells differentiate. We also showed how our algorithms (based on algorithmic complexity) can reconstruct the causal mechanics of dynamical systems. This proves that the authors of Assembly Theory are wrong regarding their many false claims involving algorithmic (Kolmogorov) complexity, their numerical approximations, and their claim that this is not, or cannot be, related to causality.

H Zenil, N.A. Kiani, F. Marabita, Y. Deng, S. Elias, A. Schmidt, G. Ball, J. Tegnér, An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems iScience, S2589–0042(19)30270–6, 2019.

Other researchers have also used LZ algorithms to characterise all sorts of objects, including biological and chemical example, finding a strong simplicity bias of the type captured by LZ, such as repetitions that AT claims to have found for the first time. But we were the first to report these phenomena in a paper titled “On the Algorithmic Nature of the World” published in 2010 in a scholarly volume titled Information and Computation, World Scientific Publishing Company, and later in another publication by PLoS ONE under the title “Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machine” in 2014. Both publications took place in the context of my Ph.D. theses (Lille and Sorbonne). Notably, Greg Chaitin, one of the founders of the theory of algorithmic complexity, was on my thesis committee. Unlike Cronin’s, the papers cited above are supported by empirical data as well as systematic and exhaustive physical experiments with proper controls.

The above papers, published with my colleagues, are years ahead of what the Cronin and Walker groups have authored. Their work contains no actual evidence, experiments, solid (original) theory, or novelty in their methods and index. The three papers above are only a sample of the dozens of papers that we have published in the field.

Cronin and Walker Et al. wrongly claim that algorithmic complexity is unsuitable for the complex analysis of causality. They further claim, erroneously, that their theory contributes to the study of causality, independently of algorithmic complexity.

Regarding the contributions of algorithmic complexity to the analysis of causality, we published this paper in Nature Machine Intelligence in 2019:

Unlike Assembly Theory, our framework takes into account and integrates developments in the study of causality over the previous 5 decades, including perturbation analysis, counterfactuals, multivariate simulation, etc.

Nature, the journal, produced this video to explain our approach:

Nature described our research and methods in 2019 as follows: “One group of scientists are trying to fix this problem with a completely new kind of machine learning. This new approach aims to find the underlying algorithmic models that interact and generate data, to help scientists uncover the dynamics of cause and effect. This could aid researchers across a huge range of scientific fields, such as cell biology and genetics, answering the kind of questions that typical machine learning is not designed for.”

One can see how Assembly Theory mirrors all our work, taking the ideas from algorithmic complexity (without acknowledging it) but instead doing it in a rather pedestrian fashion with an incredibly simplistic and (strictly mathematical) weak method that has little substance but huge exaggerated claims.

Hopefully, the astute reader will now understand that the authors of AT have appropriated ideas from Algorithmic Complexity without acknowledgment in order to produce a pedestrian, over-simplified, and weak method that has no real scientific substance… yet somehow results in staggeringly unjustified claims of novelty, not to mention presumptions of universal applicability.

Based on the body of scholarly literature that we have produced over the last 15 years, in the mid and late 2010s we founded the field of Algorithmic Information Dynamics, of which Springer Nature and Cambridge University Press published these two books in application to causality and living systems:

Notice the subtitle of the volume on the left: “Beyond Statistical Lossless Compression” (this is, beyond simplistic techniques like AT), and the subtitle of the second book on the right: “In application to Causality and Living Systems” (what the authors of AT claim algorithmic complexity is not equipped to do).

If you missed from one of my videos the glossary of terms that Assembly Theory has misappropriated (plagiarised?), here it is:

One strategy employed by the authors of AT is to cite previous research as examples of ‘what they are not doing’, because, according to them, the previous research is flawed or useless. Another strategy they employ is to trivially vary an original definition by making it weaker than the original in order to claim their version as novel. With each successive change, their claims become more fallacious and misleading. Cronin also says he wants to apply his Assembly index (equivalent to LZ77/LZ78) to natural language text to classify languages and other objects, such as species, using an algorithm in the LZ family (based on the same principles). This has also been done by Clibrasi and Vitanyi in 2005 in their landmark paper on Clustering by Compression, published in IEEE Transactions on Information Theory. Following that publication, text compression has been widely employed for detecting fraud, authorship, and, paradoxically, plagiarism. Cronin will say his assembly index is not a compression algorithm, but it is.

Regarding my interactions with Prof. James Tour:

Some have inquired why I decided to participate in Prof. James Tour’s podcast even though I oppose religious ideas that he believes in. I opened my interview by saying I was the opposite of a creationist and not a religious person, and closed my interview by calling out Intelligent Design on my last slide. Throughout the interview I was very clear about my convictions and I am happy his audience had access to it, allowing them to be exposed to what I think is a healthy and balancing counterview.

My duty is not to judge what are the beliefs of the person that wants to discuss with me about science, my duty is to answer with the most objective answer I can come up with as a scientist on a subject of my expertise to whoever asks or reaches out to me. Not reaching out across the aisle is what promotes greater division and often radicalisation creating small isolated spheres of information where James Tour’s audience would have perhaps never heard someone like me. So, the fact that the video has as of now more than 150K views since the first week only, is for me a great result and a double mission accomplished, calling out creationism and Lee Cronin.

Moreover, Prof. James Tour is one of the most reputed synthetic chemists (acknowledged by Cronin himself in one of the many podcasts he has participated in). He has been awarded multiple prizes by organisations such as the Royal Society of Chemistry for his work (the same society that, in contrast, suspended Cronin for misconduct widely covered by multiple media outlets). Doesn’t it say something about how things work that Tour, a professor who happens to have a podcast for his own reasons, reached out to me rather than science communicators like Lex Fridman and others? I shared the same information with Philip Ball, a popular science communicator, who ignored much of what I said and misrepresented me by including only a water-downed single sentence in his article on Assembly Theory as saying that I thought ‘Assembly Theory was yet another measure of complexity’ in his Quanta and parroted Wired articles about Assembly Theory. They do not want to undo what they, the science media did, because they would be accepting they were grossly fooled. I wrote about this in another blog post.

UPDATE (Saturday, 6 January 2024): Videos and Interview Disputing Assembly Theory

To help my readers and others who may have been misled by Assembly Theory, I have decided to support my arguments with a video posted on YouTube, as well as my interview with Prof. James Tour, who kindly invited me onto his podcast, after seeing me at last year’s Origin of Life round table with Lee Cronin at Harvard University.

Professor Tour is a Professor of Chemistry, Materials Science, and Nanoengineering at Rice University in Houston, Texas. He is a world-renowned authority on nanotechnology and has pioneered the field of nanomedicine and the application of nanorobots in drug delivery. For example, Tour’s lab’s research into graphene scaffolding gel capable of repairing the spinal cord of paralyzed mice.

As per his Wikipedia page, Prof. Tour has about 650 research publications and over 200 patents, with an H-index > 170 with total citations of over 130,000 (Google Scholar). Prof. Tour was awarded the Royal Society of Chemistry’s Centenary Prize for innovations in materials chemistry with applications in medicine and nanotechnology (the same society that, in contrast, suspended Cronin for misconduct widely covered by multiple media outlets). Tour was inducted into the National Academy of Inventors in 2015. He was named among “The 50 Most Influential Scientists in the World Today” by TheBestSchools.org in 2014.

Unlike me, Prof. Tour is a deeply religious man (not uncommon among scientists). Prof. Tour’s beliefs are a matter of personal inclination and, though I do not share these beliefs, I respect Professor Tour as a scientist and researcher.

These videos show that the work of Lee Cronin and Sara Walker borders on blatant plagiarism. The only reasonable explanation for this is ignorance of the fields of research they think to be contributing to. If the explanation is not ignorance, then it is dishonesty, and they have already crossed the line into deception and plagiarism, see:

Assembly Theory is identical to LZ77/LZ78 and algorithmic (Kolmogorov) complexity. Every aspect of Assembly Theory is disputed by Dr. Zenil.
Dr. Hector Zenil interviewed by Prof. James Tour (Rice University) after the Harvard debate on the Origin of Life with Lee Cronin. It debunks Assembly Theory as introduced by Lee Cronin and Sara Walker.

Offer to Cronin’s and Walker’s group members (some of you have already contacted me expressing your support, I admire your courage): I know how difficult it may be for postdoctoral researchers in Lee Cronin’s group to face the reality that they have been grossly misled by what looks more like online influencers than scientists, three of them have already reached out to me asking me not to disclose their names but in agreement with my assessment of Assembly Theory and in disagreement with their principal investigators and supervisors Cronin and Walker. Please, do not be discouraged. You should not be forced to waste your valuable time or career with their personal agendas.

UPDATE (Wednesday, 29 November 2023): Assembly Theory loaded claims with no substance inundating journals and media

James Tour and Lee Cronin had a public debate hosted at Harvard University. In refuting Cronin’s unfounded claims that he had basically ‘solved life’ (and almost everything else in the universe, according to the authors’ claims to the media, see below), our work, and some excerpts from this blog pointing out the many problems with Assembly Theory, were cited. Dr. James Tour is a Professor of Chemistry, Materials Science, and Nanoengineering at Rice University in Houston, Texas.

These are the titles and Press Releases approved by the authors of Assembly Theory feeding the media.

While I may not share all of Prof. Tour’s beliefs regarding religion (he did not use any religious arguments to refute Cronin’s claims), I think he did a service to science and scientific practice by pointing out the many false claims made by Cronin and his group. While I may also not have dismissed a paper by Cronin as ‘garbage’ without elaborating, I understand the huge discrepancy between their many false claims and their actual contribution. This bears out the Bullshit Asymmetry Principle, also known as Brandolini’s Law: the amount of energy needed to refute bullshit is an order of magnitude more difficult than that needed to produce it.

A noteworthy feature of the debate and what follows is how Cronin’s attitude is diametrically different from his public profile and bold claims. Before an audience of Harvard and MIT scholars, he claims to know almost nothing; he says he is an amateur in almost every field of knowledge and that he may be completely wrong (as we have shown he is). This humility is merely a strategy to avoid further grilling when facing other than laymen.

Contrast this to his public claims (made in contexts where no one can refute him), including claims that he has ‘unified physics and biology’, redefined ‘time’ as a physical process (he says so on the video), and so on, none of which is remotely true. I was expecting a technical defence of AT by Cronin. Instead, he presented a general talk appropriate for a lay audience to an audience of Harvard scholars.

According to Cronin (and this was one of several false statements made in this debate), “no one is disputing the science” (of AT). We have, and only a few minutes previously, James Tour had, but Cronin seems to specialise in sophisticated academic deflection.

According to Cronin (and this was one among several false statements aired in this debate), “no one is disputing the science” (of AT). We have, and only a few minutes previously, James Tour had, but Cronin seems to specialise in academic deflection.

Since the introduction of cell biology over a hundred years ago, we have known about the highly hierarchical structure of living systems, including how systems reuse components. Systems biology has studied this for decades, the Lego blocks that build living systems (which the AT authors want us to believe they have discovered).

At the Harvard University event, Cronin compared himself and the situation with AT to Galileo Galilei and the heliocentric model — despite insisting that he was no Galileo. As if Assembly Theory was so disruptive that he had annoyed the establishment. This is not the case; there is nothing concrete about AT for us to oppose because it lacks the substance required to invite articulate opposition.

This is our main criticism: that there is, in AT, no actual science to criticize; AT is not saying or proving anything that existing theories and methods have not already demonstrated. We have demonstrated it is equivalent to algorithmic complexity and instantiates a version of a popular compression algorithm. We demonstrated five years before them, that all sorts of old and new indexes separate organic from nonorganic compounds (which Cronin and Walker conveniently confound with ‘life’) using exactly (and only) the same mass spectral data that the authors of AT used, showing they perform similarly or better than AT.

More papers disputing the science of AT are coming out, showing how inert (natural, experimental, and synthetic) minerals can have a high assembly index (hence being ‘alive’ according to Cronin’s and Walker’s Assembly Theory) — as we showed theoretically in our paper and predicted would be the case.

Their modus operandi does not belong to science. Rather it resembles a complex publicity stunt that steals authority from existing research while simultaneously avoiding the necessity of citing it. This behaviour does a disservice to science and the general public.

Bad as this self-promotion is, it is equally dishonest to:

• Claim to be doing science absent of basic control experiments

• Ignore or dismiss the related or original work of others

• Appropriate ideas without proper attribution

• Conceal conflicts of interest

• Misframe reality to garner agreement

• Create a public persona in order to appear more credible

This behaviour is not scientific, and contributes to a terrible ecosystem of false science that incentivises scientific misbehaviour as well as further undermining the process of scientific research in a time when researchers are already grappling with problems such as in-group academic promotion, the necessity of acquiring tax-payer grant money, and competing with attention-seeking ‘public’ science figures.

Of course, these problems are not exclusive to Cronin and Walker, but their behaviour is a prime example of all that is currently wrong in the practice of self-promoting ‘science’. Their work is an hyperbole of the kind of science that reinforces dubious incentives. The fact that they have managed to mislead colleagues and the public — even their own students — and are rewarded in some contexts for doing so (with promotions, high-impact articles, media attention, and government grants) in an atmosphere of little to no accountability — is, of course, a reflection of broad-scale problems in the scientific community and in society in general.

UPDATE (Saturday, 25 November 2023): The authors of Assembly Theory utilise the ever-moving goalpost strategy

The current narrative of the authors of Assembly Theory, contrary to our criticisms, seems to boil down to the following:

1. They claim they are not implementing a compression algorithm, or anything related to algorithmic complexity because they are taking a ‘physical signal’ and processing it directly under a hypothesis about how a molecular compound may assemble; moreover, their algorithm is (slightly) different. This does not make sense.

Coding schemes and lossless statistical compression algorithms like LZW are totally mechanistic, they offer a step-by-step framework from which a plausible physical unfolding process may be derived, and are formally preferable because they have proven to be optimal in a universal sense (they asymptotically converge).

Conversely, the AT hypothesis about how a molecule may have assembled is pure speculation, as it is not grounded in any physical evidence (evolutionary or otherwise). Hence there is no state-to-state correspondence between their theory and reality. AT has already been shown to face many elementary problems and challenges even in justifying its chemical layer, let alone other layers. In other words, there is no indication that their molecular compounds are assembled in the way they think they are (a form of suboptimal reverse Huffman process), or indeed in any other way. That is, their hypothesis is as good as Huffman’s when applied to the same (physical) data. Actually, the evidence is in favour of other algorithms because they separate the classes abiotic from non-abiotic classes in a more statistically significant fashion than AT does, implying that there may be a better correspondence with the ground truth it is trying to capture under the same hypothesis advanced by AT. The speculations of the AT authors are, thus, not related to any evidence of physical assembly and are, therefore, as good (in the best case) as Huffman’s assembly and disassembly processes. Statistical compression has been applied to many other physical biological signals, and the AT measure (marked as MA on the plots) is nothing but a restricted version of a compression algorithm that evaluates a piece of data that is represented in a computable fashion, just as it would be for any other measure, including lossless statistical compression.

2. They also claim that other indexes ‘require a Turing machine’ in order to be applied, which does not make sense either, as I have explained extensively below. This is like saying that their measure requires a Dell computer because they carried out their calculations using a Dell computer. A Turing machine is just a computer program or an algorithm (and a universal Turing machine is just a program that can run any algorithm, program, grammar or rewriting system, no need for ‘Ifs’ or ‘Whiles’).

3. They also claim that we have not applied our indexes to the same spectral data. This is false; we have applied our measures on exactly the same mass spectral data they have used (our Fig. 4, extending their Fig. 4 on their Nature Communications paper). We did use other data in some other of our figures, only to show that there was nothing special about mass spectral data and that all our indexes (out)performed MA when applied to molecule distance matrices or InChI descriptions (which we reported almost five years before AT was introduced, see below) to distinguish the same classes (organic versus nonorganic) as they did.

The problem is that the authors of AT inject so many fallacious statements per claim that it is impossible to have a rational scientific conversation with them regarding their theory. This is a quintessential example of the so-called Bullshit Asymmetry Principle. The Bullshit Asymmetry Principle, less well known as Brandolini’s Law, states that the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it. More than perhaps any other individual, I have remained committed to scientifically demonstrating to the world the falsehood of every statement the AT authors have made. Fortunately, more papers on the fallacious claims of AT are coming out in the wake of ours.

UPDATE (Thursday, 19 October 2023): On the unscientific and unethical behaviour of the senior authors of Assembly Theory

We’ve been reporting what I think is the unscientific and unethical behaviour of the senior authors of AT for years now. I was the first to report on the false and sensational claims of the authors, as well as the complicity of the journals in which they have published. However, the science writers and magazines responsible for amplifying the claims of the AT authors should also be held accountable, particularly for giving little to no bandwidth to neutral or critical views that would comprise more balanced and scientifically valid reportage.

With the publication of their recent paper in Nature, the situation has acquired an almost comical degree of deceptive grandstanding. The collaboration of the journal, the senior authors’ university PR departments and public media has produced a monster whose tentacles can no longer be obscured by Cronin’s articulate sophistry. Many people are now beginning to see through the hype, including some of the authors’ closest collaborators.

Engaging prose notwithstanding, their paper verges on full-blown plagiarism, appropriating core elements of algorithmic information theory and my own group’s work, so that ‘Assembly Theory’ (AT) becomes indistinguishable from the principles of algorithmic complexity (including elements of algorithmic probability and Bennett’s logical depth — part of the same body of knowledge). This paper is even more shallow than the last one, though it makes even bolder claims, with the authors and the media now widely reporting AT as the theory that unifies biology and physics.

All this hoopla is about a ‘theory’ whose method is a weaker form of Shannon Entropy applied to counting exact copies of molecular sequences, a method that has been used (and abused) in every popular lossless compression algorithm since the 1960s (like GZIP) to approximate algorithmic complexity — despite the authors going to great lengths to distance themselves from compression and algorithmic complexity. In fact, what they implement is none other than a compression algorithm (a simplistic and mathematically weak resource-bounded version of Kolmogorov complexity, and a trivial version of our own Block Decomposition Method), even openly adopting the language of compression — using terms like ‘shortest paths’, ‘smallest number of steps’ and so on — as they have often done before. Why did the authors did not compare their resource-bounded measure with any other measure? Because they arbitrarily decided that the others were ‘uncomputable’. This is formally impossible; compression algorithms are, fundamentally, computational in nature.

Yet the authors’ hubris was apparently unsatisfied. They went on to claim that their theory could accomplish all sorts of astonishing things, from detecting life on other planets to their ‘breakthrough’ of making time ‘physical’ using AT — a claim that no one can understand (time was already fundamental in the theory of evolution). Their ‘assembly time index’ is an absolute quantity trivially defined without evidence of any physical state correspondence or causal connection to actual objects, events or situations in the physical world.

Now they claim that the same theory can explain selection and evolution, unify biology and physics, and explain all life, while experts in selection and evolution don’t think they are even engaging in a serious discussion of the basics of biology. The authors are not falling far short of falsely declaring AT the Grand Unified Theory of Everything, when behind the magician’s curtain, we find nothing other than a weaker version of one of the simplest algorithms known in computer science, LZW (Shannon Entropy) that counts repetitions in strings of data (though, unlike AT, LZW has been proven to be optimal and actually count repetitions correctly).

Not surprisingly, my group achieved the same results with a 50-year old algorithm, and improved upon them showing we can separate organic from non-organic molecules better and with a 10x larger sample than the one cherry-picked by the authors of AT. We reported this (almost five years before AT) using molecular distance matrices with simple algorithms.

Rather than address these facts, and many other criticisms levelled at their work, the authors of AT have gone into ‘full plagiarism mode’ and appropriated key features of the research we published in a Royal Society paper. Specifically: linking resource-bounded (i.e., computable algorithmic complexity) with ideas from evolution theory, and genetic modularity as an emergent phenomena of algorithmic probability) without attribution.

They went on to make wildly unfounded claims that are ten orders of magnitude bolder than any we would have ever dared to make.

While the authors have enough talent to appear to be describing a profound and important new theory while, actually, presenting something that is mostly hollow and naïve. Their actual theory is far from commensurate with their methods, let alone the grandiose claims made on its behalf.

Unfortunately, they have made a marketing exercise of science, and they command a giant, well-oiled machine operating at full throttle that scores publicity stunt after publicity stunt, with the latest one calling AT the theory that has unified physics and biology. They operate unchecked by either their universities or the media, with the latter neglecting to feature critical views that would make for fair and balanced reportage. The University of Glasgow titled their press release “Assembly Theory Unifies Physics and Biology to Explain Evolution and Complexity” while ASU titled theirs “New ‘assembly theory’ unifies physics and biology to explain evolution, complexity”. Together they triggered everything that followed. I think this business has the potential to explode in everyone’s hands and end badly for those involved, as recently happened with Integrated Information Theory, but far worse because these claims are 99% marketing and only 1% some thinking.

Scientists and researchers are taken aback by the enthusiasm with which such shallow ideas have been promoted. This work has already received a huge wave of negative reviews from experts in all relevant areas, including complexity, computer science, evolution, chemistry, physics, and biology, testifying to just how rotten the state of scientific publishing and scientific practice is these days. I am glad I have been calling out this kind of unethical practice for years. I feel vindicated for bringing to light the vicious ways in which the system, in collusion with authors intent on self-promotion, enables such scientific scams.

In the wake of the scandal surrounding Integrated Information Theory and now Assembly Theory, I am more sceptical than ever of the quality and integrity of some of the highest-impact journals, especially Science and Nature.

UPDATE (Monday, 4 September 2023): How media communicators fail us

To learn more about how the Quanta article on AT (parroted by Wired) and, to a greater extent, the article in New Scientist failed to follow the basic principles of objective journalism and scientific journalism, please go here, as I do not wish to distract readers from the primary issues at stake with Assembly Theory.

UPDATE (Saturday, 8 July 2023): Assembly Theory spreads misinformation across science journals and science media outlets

Unfortunately, the authors of Assembly Theory (AT) continue spreading misinformation in a podcast online despite having been publicly corrected about algorithmic information theory and complexity science. They have said, once again, that Kolmogorov complexity ‘requires a Turing machine’ and is all about ‘Turing machines and Oracles’ (Oracles being a type of formal Turing machine that helps mathematicians prove theorems).

This makes no sense. The founding fathers of complexity and algorithmic information theory, who were mathematicians, may have unduly emphasized Turing machines and Oracles, thereby giving the impression that their main features were negative results preventing their application, but in point of fact, this is not the case. In this paper, for example, we defined algorithmic complexity as a tool for causal discovery and causal analysis based on a regular computer language; without the requirement of Turing machines or shortest programs.

Such abstractions of a computer as Turing machines were merely useful in proving the theorems (but it is the theorems that are fundamental). Theorems that the authors of AT may disregard or ignore endow the indexes based upon them more formally reliable foundations than underlie the ad hoc indexes based on personal choices of what the AT authors take to be a fundamental property of an object such as an organism. The status of such indexes as estimations is no different from any other tool used to approximate the scientific explanation of an observation or a natural phenomenon. The belief that the Turing machine, or its way of operation, is fundamental indicates a lack of knowledge of algorithmic complexity, a fundamental theory that demands scientific rigour and understanding, given the many elements the AT authors appropriate from it (and the criticisms they level at it).

The proponents of AT also allege that the field of complexity science has ‘not settled’ on any formal measure of complexity and is incapable of resolving anything. Not only is this false, but even if it were true, AT doesn’t answer this problem. As an illustration, let’s explore some of the fundamental indexes in complexity theory:

(Variable window length) block Shannon Entropy (or Entropy rate): This measure counts the number of repetitions of variable length in a piece of data according to a mass probability distribution. Does it sound familiar? Indeed, the assembly index is a version of this, but drops the mass probability distribution. This means AT always assumes the simplest uniform distribution case and is, therefore, a weaker version of Shannon Entropy (as demonstrated in our paper).

Huffman coding: It optimally looks for nested repetitions in a piece of data, sorting the most frequent ones to minimize the number of steps necessary to reconstruct its original form. The result is a step-by-step tree with the instructions to assemble the original object. Does this sound familiar? The authors of AT echoed this word for word in defining AT and their complexity index, apparently unaware of this extensively used algorithm in computer science that was introduced in the 1960s. They created an algorithm that turns out to be a suboptimal version of Huffman’s coding scheme, because their assembly index does not count for copies correctly. In this sense, their algorithm is a variation of Huffman or RLE, another, even simpler, algorithm introduced in the 1960s.

LZW: LZ or LZW (and other statistical lossless compression/encoder algorithms) implements a dictionary-based approach of which Huffman and LZ77/LZ78 are examples. It is widely used in compression algorithms, appearing in familiar file formats such as GIF and ZIP. It has been used in many applications, ranging from genetics to spam and, paradoxically, plagiarism detection. A landmark paper using physical (or experimental) data directly was published in 2005 and has over 1500 citations. It is indistinguishable from Assembly Theory. This is why the authors of Assembly Theory refuse to simply accept they are using a compression algorithm. Doing so would require the admission that they have been doing all along what they categorically claim not to be doing, namely: employing a carbon copy of algorithmic complexity approximated with a resource-bounded measure they call ‘the assembly index’ that is very limited compared to other more sophisticated measures that are also resource-bounded, computable and approximations of algorithmic complexity.

Algorithmic complexity (Kolmogorov, Chaitin): This is the universally accepted measure of randomness, settled on in the late 1960s or early 1970s. It is a generalization of all the above indexes and any similar ones, including Assembly theory (AT). It makes AT possible. Indeed, AT is a loose upper bound of algorithmic complexity, because simplistic statistical measures like the assembly index cannot deal with simple transformations of data — reverse copies, for example. They would fail the most basic non-trivial tests and would not be able to scale up to any real-world scenario beyond trivial cases with perfect representations, which (block) Shannon Entropy could have resolved anyway. The authors of AT claim that Kolmogorov complexity is uncomputable. This is not entirely correct; it is semi-computable, which means estimations are possible of which AT is one (a mathematically speaking, weak statistical one). Unless the AT authors think their assembly index is not an estimation of life but the ultimate measure of life, it is as much an estimation as estimations of algorithmic complexity. In fact, we claim it is an inferior one because AT is a loose upper bound of algorithmic complexity, and as we argue below, it cannot capture very basic cases (like simple reverse copies), and as soon as it tries to, it gets closer to actual implementations of algorithmic complexity.

Algorithmic probability (AP) (Solomonoff): Deeply related to algorithmic (Kolmogorov) complexity (proportionally inverse), this measure is the accepted mathematical definition of inductive inference. Indeed, algorithmic probability inaugurated the field of Artificial Intelligence when Solomonoff presented it at what is today considered a landmark event, a workshop at Dartmouth College where all AI researchers accepted it as the final solution to the problem of inference. At an event a few years ago in NYC, Marvin Minsky, one of the fathers of AI, urged everybody to study algorithmic probability and algorithmic complexity, adding that he wished he had been able to devote his life to it. AP addresses why some object configurations are more probable than others. Does this sound familiar? It was fully appropriated by the authors of AT and presented as their novel discovery.

Logical Depth (Bennett): This measure separates complexity from randomness and simplicity (contradicting Walker’s claim that this had not been dealt with before, creating the context for the (false) claim that AT solved this problem). It measures the time, in the number of steps, needed to unfold an object from its set of (approximated shortest) possible origins. Sound familiar? The AT authors claimed this as a feature of AT, word for word. Bennett also talks about memory size and time in the causal evolution of an object based upon the fundamental measure of none other than Kolmogorov (algorithmic) complexity. All of these concepts that make AT sound robust have been taken from these other measures. However, it is crucial to understand that AT implements a weak version of Shannon Entropy and is indistinguishable from LZ77/LZ78 (up to at most a small constant in the number of steps), so the distance between the claims made for the theory (appropriated from the work of others and introduced as a Theory of Everything) and what the simplistic assembly index actually does, are staggering and inconceivable.

Resource-bounded Kolmogorov complexity: This is a computable version of Kolmogorov complexity that the authors of AT pretend does not exist — or suggest is impossible. They pretend AT is not an approximation of the real world. There are several resource-bounded approximations of Kolmogorov complexity, such as Minimum Description Length methods, which function by limiting time or space in order to produce computable approximations. All the computable measures above are resource-bounded versions of Kolmogorov complexity or upper bounds (AT is such a measure). We introduced our resource-bounded versions in 2010 with great success, garnering over 2000 citations from all sorts of groups despite not having the marketing engine of AT. The first of our algorithms is called the Coding Theorem Method (the coding theorem formally relates algorithmic complexity and algorithmic probability) or CTM. CTM, like any other measure, is an approximation or estimation of what the thesis in the theory suggests and conforms well with the expectations of Kolmogorov complexity. It is also the basis of the field of Algorithmic Information Dynamics, the subject of a book that has just been published by Cambridge University Press.

Block Decomposition Method (BDM): A measure we introduced that basically integrates all the above measures. BDM has been applied to ‘physical’ (or ‘experimental’) data of the same type that the authors of AT deal with (obviating their claim that they have done so for the first time). We’ve employed it with physical data ranging from DNA folded in nucleosomes — to find regions of high genetic encoding (research that has been published in the top journal of nucleic research and uses real ‘physical’ genomic data), to the same mass spectral data that AT used in their original paper, producing similar or better results and clearly separating organic from inorganic molecules/compounds (almost five years before AT. A preprint version of our results is available here). BDM is based upon the history of complexity science over the last 60–70 years, and incorporates, in one way or another, all the knowledge from the measures above, properly attributing each feature to its rightful source. It counts the number of repetitions of different lengths for long-range correlations using Shannon Entropy, but it also looks for small patches at local short ranges of algorithmic complexity, thus combining the best of both worlds: a statistical measure for scalability and quick computation, and an algorithmic symbolic measure with a short range that provides more solid approximations of local randomness and complexity.

None of the above algorithms are forced to work on only one-dimensional bit strings. They can operate on any digital object. Ultimately, AT only ingests digital objects. In fact, just recently, we applied Entropy and our own indexes on multi-dimensional objects, even sound, to prove that non-random information encodes its own geometry and topology.

Other indexes measuring features that some authors find interesting may include concepts like ‘self-organization’, ‘emergence’, ‘synergy’, etc. They are indeed not settled because their authors advance such indexes based on the beliefs or assumptions of their theories. This is no different from AT. The AT authors propose to focus on how many copies of the same type are being reused in an object to drive their index. So, it is not agnostic as they claim, and does nothing to address the purportedly unsettled status of the field. In fact, it enlarges the number of unsettled and controversial indexes. However, these indexes are not the things that are attacked by the authors of AT, but rather, out of ignorance, the foundations, which are, in fact, settled. This is the main divergence from algorithmic complexity. Algorithmic complexity does not carry any bias or author baggage. It does not focus on any particular feature but considers all of them together to find the unfolding causal and mechanistic explanation of an observation. This is why it is ultimately more difficult to estimate — because nature and biology are complex, and life cannot be defined simplistically. This demonstrates how AT fails — in theory and in practice.

Why are the above measures fundamental in complexity theory? They have all been proven to be optimal or fundamental in various ways. For example, LZW, has been proven to converge to Shannon Entropy, and other lossless compression algorithms can converge faster, though they are not fundamentally better at the limit. Measures with a * mark are measures that have been proven to be universal or invariant — in the 1960s, when mathematicians were trying to define randomness in various ways, utilising all possible statistical patterns, unpredictability, and compressibility. It turns out that each of these characterisations are equivalent to the other, thus stabilising the field and resolving once and for all questions related to randomness, simplicity, and optimal induction. Every other weak definition of randomness would turn out to be contained in algorithmic complexity or to actually be algorithmic complexity. AT is indeed contained in algorithmic complexity and does not resolve randomness from complexity.

The authors of AT also suggest that their theory contributes to finding the minimum number of steps to define or create life. This, too, is false. In fact, one can create an object to fool AT both in theory and practice, as demonstrated in our paper, and nothing prevents such an object from existing in nature. In fact, they do: complex crystals or piles of coal are examples provided in this other blog post in another critique of AT based upon the basics of organic chemistry. As previously noted, AT previously predicted that beer was the most complex living organism on Earth. The assembly index would characterise complex crystal-like structures as living systems. In fact, all the other measures can do the same. Shannon Entropy on a uniform probability distribution can find the number of bits needed to encode the number of steps that AT claims is ‘the magic number’ for life (which we don’t think it is and therefore, responsibly, we refrained from making unfounded claims about it); LZ77/LZ78 and Huffman can find the number of tree vertices in its statistical ‘causal’ graph, just as AT can, albeit optimally (because AT does not count correctly, as we proved on their own example ABRACADABRA, see figure below, there are plentiful online examples using LZ77/LZ78 on ABRACADABRA). Lossless compression can provide a compression ratio, a threshold equivalent to the 15 steps, and it has always done this in all applications. See, for example, a characterisation of complexity that formalises Wolfram’s classes by compression published in the Journal of Bifurcation and Chaos.

Measures such as Shannon Entropy are used everywhere in the world. For example, every time someone takes a night picture with their phone and is asked to remain steady and not move their hands, it is because it wants to maximise the mutual information of overlapping pictures using Shannon Entropy. Can things get more physical, experimental, or applied than this? Hardly. AT is clearly one of many complexity indexes that takes data from the real world and processes it like any other. LZW is also used everywhere; Logical depth has been used before to characterise ecosystems (hence life, and biosignatures on earth), and to classify human-made objects into simple and random objects, in a paper published by our group under the title ‘Physical Complexity’, using real-world physical data. Algorithmic probability has also been used in DNA and protein research, and CTM and BDM have been used in areas ranging from psychometrics and visual cognition to animal behaviour and medicine — all utilising real-world, physical data.

Doesn’t the fact that optimal mechanistic inductive inference (AI and causality) is the other side of the algorithmic complexity coin sound totally fundamental, more so than anything else advanced by AT? Indeed, an upper bound of Kolmogorov complexity is a lower bound of algorithmic probability. These two fundamental concepts are inextricably intertwined. This may be news to some people, thanks to researchers like the authors of AT and their widely touted misconceptions. It is one of the most important scientific results in history (even called miraculous and highly praised by Marvin Minsky, and the cornerstone of complexity theory, which the authors of AT ignore (or pretend to have come up with themselves), grossly misrepresenting it while advancing a simplistic measure in its place.

The authors also claim that their assembly index is not a compression algorithm and has nothing to do with shortest programs (see fallacies below). This is not true. The definition of Kolmogorov complexity uses the concept of the shortest computer program, but computable approximations look only for upper bounds and can find all sorts of programs, each of them a causal and mechanistic explanation that can be represented in a graphical fashion explaining data.

Does this sound familiar? Yes, again. It makes an appearance in the mix of concepts that the authors of AT pitch to the reader. The purpose of science is to find short explanations/models that explain observations. If they think Kolmogorov’s complexity is flawed in pursuing this goal, then so is science. Kolmogorov complexity took the idea of the simplest model to the extreme in order to generalise over all cases; AT aims to do the same by finding short explanations (or shortest paths as they now call them), but does so in a trivial manner, using a badly written algorithm that was introduced in the 1960s and is known to be so simplistic as a characterisation of life that researchers in the field long ago abandoned it and moved on.

So, what do we prove in our paper criticising AT? Among other things, including theoretical and fundamental issues, we show that, effectively, any trivial statistical algorithm can do the job (as well or better) than AT does, despite the claims of its authors, some true believers, and their friends, that it is unique and, moreover, new and radical.

By ‘complete’ molecular description we mean a description that allows, in principle, reconstruction of the molecule/compound with little information of fixed size (i.e., that does not depend on the molecule/compound itself). Examples include InChi, molecular distance matrices like mol files, or mass spectral data such as the authors of AT used for their assembly index. In all cases, the alternative indexes were able to ingest ‘physical’ data, just like their assembly index, or any other processed data such as InChi codes or distance matrices, which also derive from the physical and chemical properties of the molecules/compounds.

Now, the authors of AT attempt to distinguish their measure as unique, saying it is the only one that ingests ‘physical’ data (if Cronin is a materialist, as he claims, shouldn’t all data be physical?). This is also false. Not only have the above measures been widely applied to data that comes directly from physical observations; their own index takes a computable representation of physical data, not physical data itself (they do not take atoms or molecules through their algorithm, instead, they parse a matrix with numerical values using a computer).

Resource-bounded measures of Kolmogorov complexity (of which Shannon Entropy, for example, is one) are the object of broad investigation and widely successful application. These measures make all sorts of applications possible, from video compression to fraud and spam detection. Resource-bounded computable versions, such as CTM and BDM developed by our group, have been proven effective when applied to physical data (a long list of relevant publications may be found here).

Assembly Theory itself is only possible because of algorithmic information (it is a weak copy of algorithmic information, with the authors failing to cite those whose fundamental ideas they have drawn upon). Their assembly index is another computable resource-bounded weak version of statistical encoders (such as compression algorithms, which count copies as AT does as their most basic step).

Their misconception that algorithmic complexity ‘requires a Turing machine’ is equivalent to saying that the assembly index requires a finite automaton, because that is what they would need to run their simplistic algorithm (yes, they run it on a weak version of a Turing machine called a finite automaton).

To justify the authority of their theory, they say that nothing is settled in complexity theory. This is also false. In all its versions, Kolmogorov complexity is the absolute accepted definition of mathematical randomness. If there are other measures of complexity (Assembly Theory itself is such a measure) it is because different authors believe their measure captures a specific fundamental property that is of particular interest to them — just as Assembly Theory does.

Again, the greatest weakness of AT lies in its lack of control experiments — which we performed for them, albeit without recognition. These control experiments would have told them that almost any old statistical index (e.g., Huffman, RLE, lossless compression like LZ77/LZ78/LZW, etc.) + almost any description of a molecule/compound that can reconstruct the molecule/compound (invariance theorem) can produce the same results as AT.

Therefore, AT has little or no value to science, despite the huge marketing campaign the authors are conducting in their attempt to garner credibility by associating AT with various indisputable notions (such as the recursive nature of nature, modularity and the nestedness of living systems at reusing resources).

In the podcast referenced above, Cronin revealed what I take as an acknowledgment of the mistake they had made — thinking that they could count ‘physical copies’ and scale that up. According to him, it is a feature of their theory (every weakness is a feature for them) that not every variation from a perfect copy of a molecular configuration would be picked up by their measure. They say this is good because bad copies of molecules create bad entities. This is true, but this argument works only at the very microscopic scale, perhaps a few nanometers, because at any larger scale, these imperfections will start to appear, making their index irrelevant (this is why complexity theory moved on from such simplistic measures about 40 years ago). This means that anything larger than a few molecules — not even nucleosomes, let alone cellular or multi-cellular life — will be beyond the effective scope of their assembly index.

It is a pity that the authors of Assembly Theory feel they have to ignore or trash everybody else’s work (based on false premises) in order to pretend they have created novel science from scratch, advancing their work not by proof, argument or experiment, but self-promotion. Their exploitation of international media and social networks comes at the expense of a great deal of time and taxpayer funding. The authors seem inclined to capture the fascination of the general public, rather than submit to examination or refutation by the broad population of scholars in the fields appropriate to the branches of research they are claiming to have suddenly encompassed, integrated, and superceded. Perhaps as problematically, many non-specialists may be misled by their bold and unjustified proclamations. Though the authors may eventually be forced to end their media campaigns, our intention is to educate the scholars that might otherwise be deceived by the charade presently on display in the media.

And all this is just the tip of the iceberg as regards the misleading claims of Assembly Theory (see the rest of this blog post). For a well-deserved criticism of their simplistic assumptions from the perspective of organic chemistry, read this Harvard researcher’s blog post.

Though the task is rather thankless, I will do my best to keep up with and provide some balance to the uninformed views of the authors of AT. It is more difficult to debunk false theories for the sake of justice and good science than to advance them in the first place, and does little for one’s professional reputation. It is rather like dealing with fake news, except, in this case I am competing against two experts in self-promotion who already have about 10 podcasts and online interviews to their name, and many more written pieces in public media.

UPDATE (26 June 2023 + update 7 July): Assembly Theory is undistinguishable from Algorithmic Complexity and their index is a compression algorithm

The ever-changing definition of Assembly Theory keeps evolving in circles, with the authors in full self-promotion mode as they ignore the necessity of addressing their theory’s many challengers and critics.

In their recent marketing material, Assembly Theory looks more than ever like algorithmic information and bears a strong resemblance to Bennett’s logical depth. Cronin and Walker now say Assembly Theory is a measure of ‘memory’ size. This is the definition of algorithmic information, and/or of unfolding computing history (or time) from a causal origin. Bennett’s logical depth is based on algorithmic complexity, thus not an original idea either. However, their compression and decompression steps are the same number and therefore their concept of shortest assembly path is identical to a (weak) approximation of algorithmic (K) complexity.

They also claim that their motivation is to explain how some configurations are more likely than others (which is known as ‘algorithmic probability,’ also not their idea). In what amounts to a masterly publicity stunt staged with the help of their friends and colleagues, they have appropriated all the seminal ideas from complexity science by simply renaming them. But it gets worse: their measure does not do what they think it does (or does it very poorly); they do not need ‘physical’ data as they claim they do, and existing algorithms introduced in the 1960s perform as well or better than their basic assembly index.

For more on this, see our detailed technical work below, reproducing their results without all the absurd hubris involved in their claims.

The most recent media covering AT appeared in a bulletin from the Santa Fe Institute, Aeon, and in the New Scientist, introducing new terms into an intrinsically flawed and simplistic theory, such as ‘memory,’ which they adapt to identify by-products of life. This surprising new addition (which does not match what their indexes actually do) renders AT identical to algorithmic complexity and Bennett’s logical depth (introduced in the 80s) in spirit, but AT remains ill-defined and incomplete, as they are unable to instantiate it even with the equivalent of a simple statistical algorithm of 1960s vintage — which actually outperforms AT (see below).

In their new efforts to make this theory and its measure appear novel — despite not making any substantial contribution to the field — the authors claim that the measure is not just about identifying life but the by-products of life, clearly an attempt to backpedal from the fact that their measure predicted that beer was the most alive product on earth. This new development makes Assembly Theory indistinguishable from algorithmic information and Bennett’s logical depth, introduced in the 1980s. Moreover, their actual measure fails very short of performing even the most basic tasks easily accomplished by other existing measures.

In the never-ending exercise of rehashing a simplistic idea of life to make it more credible and immune to criticism, the authors of Assembly Theory have decided to change the narrative around the measure and introduce yet another term, this time ‘memory,’ and ‘computational history,’ or similar terms, thereby making their AT approach a carbon copy of algorithmic complexity, as it is about the length of the model that generates the object’s history (including time). In focusing on ‘time,’ which they introduce in a grandiose fashion as if it had never been considered before, they succeed in making AT a carbon copy of these decades-old and powerful theories of life, while failing to make the appropriate attributions. The simplistic measure of Assembly Theory does not match their grandiose description, so not only does AT turn out to be a carbon copy of an existing theory of life, but a profoundly suboptimal carbon copy.

Instead of seizing an opportunity to justify their claims or explaining how other measures can reproduce and even outperform what they claim to be outstanding results (see Figures below in our original post), they have continued to make grandiose and completely unfounded claims.

Assembly Theory has gained traction because the authors highlight a property of life nobody can disagree with: life is highly hierarchical, reuses resources, and is heavily nested. We have known this for decades; these notions are integral to our understanding of biological evolution, genetics, self-assembly, etc. The problem is that pointing out such an obvious feature, which is common knowledge to most experts, has earned them undue credit, despite their propensity to appropriate other researchers’ ideas (without attribution) and their habit of continually introducing vaguely defined concepts and measures.

In a previous version of Assembly Theory (AT), the authors suggested that the number of ‘physical’ copies of an element used to assemble an object measured how ‘alive’ the object was. Builders use the same bricks made of the same materials in all possible configurations to construct walls and rooms on similar floors in multiple buildings in a highly modular fashion. Are these buildings alive according to Assembly Theory? It seems to suggest that these objects, just like Lego constructions and natural fractals, are ‘alive’ because they are highly modular and have a long assembly history, just as beer does. The AT authors have now realised they were wrong because their measure designated beer as the ‘most alive’ element on earth, even more so than yeast, so they have adapted their revised version to include the produce of living systems, including beer. Yet this revision doesn’t address the fact that their measure ‘counts wrong’. All they have done is modify their theory yet again in an attempt to accommodate unexpected results without backing down from previous unfounded claims. These new claims actually make the theory more vacuous, because they take all the theoretical arguments from algorithmic probability and information theory as their own. Update: the authors now exclude all sorts of objects like 3D crystals or anything that does not fulfill their definition of applicability to avoid contradictions making their measure highly ad hoc when they advertised it as universal. After we predicted Assembly Theory would fail at these simple tests, finding clearly non-living objects that would be characterised as likely alive by AT, a new paper published by the Royal Society has demonstrated this, putting an end to this side of the alleged properties of AT that we anticipated.

Our paper shows that we don’t require any of the ‘physical data’ the authors of AT refer to in order to separate organic from non-organic compounds.

We are convinced that what characterises life is the agency evinced in its interaction with the environment and not any single intrinsic (and rather simplistic) property.

Let us also address what its authors claim is another exclusive property of Assembly Theory, namely that their theory captures ‘physical’ copies and is the only one to do so, not excluding even AIT. They also claim that their measure is the only experimentally validated one.

The leading authors of Assembly Theory seem to suggest that their measure has some mystical powers that enable it to capture the concept of ‘physical copy,’ even though it is encoded in a computable description (data) and fed to their measure, just as it is fed to the measures we used to reproduce their results step-by-step, regardless of whether it is ‘physical’ or not, as any algorithm has to be computably represented/encoded to be read.

The input data to their assembly index is an MS2 spectra file (a type of distance matrix). In other words, it all comes down to a mathematical and computable representation derived from observations and fed to a complexity measure, no different from what has been done for decades. This is exactly how every other complexity measure is deployed (except when used on simulated data). For example, how LZW was used by Li and Vitanyi to define their normalised information/compression applied to genomic data. Is not ‘genomic data’ physical for the authors of Assembly Theory? Moreover, we have shown that InChi ID strings and distance matrices separately are enough to classify compounds as organic and inorganic. InChi codes may be extracted from distance matrices but are as ‘physical ‘as other descriptions, and distance matrices are also directly physically derived. If almost any other statistical measure of complexity + any compound description (such as InChi codes or distance matrices) produces the same or better results as AT on spectral data, why would one need AT in the first place, or what is it enabling that was not possible before? (over and above what was shown to be possible in the paper we published almost five years before the first paper on AT).

The input data to the assembly index are MS2 spectra files (also a type of distance matrix). The authors claim that their Assembly Theory represents the first time in the history of science that an index takes ‘physical’ data from observations, effectively claiming that they have invented science. mol files are matrices, not the actual compounds, just as InChi codes are strings. Their prime example is how their measure (wrongly, see figure below) counts for the letter repetitions that can reconstruct the words BANANA or ABRACADABRA. Do they think letters are physical while the entities all other researchers have worked with are not? The main question one has to answer is: if you get the same or better results by applying almost any other complexity measure on any molecular representation, why would you need AT and their ‘physical’ data? Our results show that distance matrices that should count as physical for the AT authors, or InChi codes, can trivially separate organic from inorganic (Fig labelled as 2 below, and main figures reproducing the AT results further, below). (July 7: A previous version of this figure wrongly suggested that the Assembly Index was reading InChi codes, irrelevant to arguments made before, nothing changes)
In a paper predating Assembly Theory, published in the journal Parallel Processing Letters, we demonstrated that by only taking InChi nomenclature IDs, sometimes enriched with distance matrices, classification into different categories was possible, including organic and inorganic, something the authors of Assembly Theory have rediscovered five years later, claiming to require spectral data to do so. They would have discovered that their special data was not needed if they had not neglected to include a basic control experiment in their work. We have applied complexity measures to observable physical data in the field for decades. My group and I have been doing so since at least 2010, including the application of algorithmic probability on physical sources (section 1.3.3), as in this paper published in G. Dodig-Crnkovic and M. Burgin (eds), Information and Computation, World Scientific Publishing Company; and, more recently, in this other paper published in the journal Nucleic Acids Research (in 2019), showing how an application of complexity indexes on nucleosome data (quite physical) can contribute to solving the second most important challenge in cell and molecular biology (after protein folding), which is the problem of nucleosome positioning, where we, as was right and proper, compared our indexes to the gold standard in the field as well as to several other complexity indexes.
In a landmark paper in the field of complexity science published in 2005 that uses compression algorithms, the authors took individual whole-mitochondrial genomic sequences from different species and correctly reconstructed an evolutionary mammalian phylogenetic tree corresponding to current biological knowledge. According to the authors of Assembly Theory, genomic sequences would not qualify as ‘physical,’ as they claim to be the first and only authors to define and validate a measure of complexity with real ‘physical’ data. This is pretty much what every complexity theorist has done in the last 50 years (including ourselves) — taken observable data with a representation and fed it to a measure to classify or extract information from it. For example, in 2018, based on our work, a group from Oxford University published an application of measures of algorithmic complexity and the coding theorem to RNA secondary structures (SS). These phenotypes specify the bonding pattern of nucleotides, which the authors of Assembly Theory would not regard as ‘physical data’.

What seems reprehensible is the authors’ open ‘fake it until you make it’ Silicon-Valley approach to science that, unfortunately, often pays off with some science journalists and science enthusiasts, who seem to be Cronin and Walker group’s main audience. Enthusiasts and science writers are sometimes drawn to such grandiose stories because the motivation of their employers is to sell more, which negative results in science hardly do (making them into something like tabloids of science), just as grant agencies seek (media) impact by rewarding senior researchers with cheap labour in the form of so-called ‘postdocs,’ exploited underpaid researchers who execute most of the research and are often misled by employers who seek personal and professional gain. Every time a public space is used to peddle bad science, attention is deflected from sound science, and a disservice is done to young researchers who do not have the deep pockets and marketing minds of these kinds of researchers and groups.

It is wrong to misappropriate ideas from others without attribution and to ignore results that predate one’s own. The community should repudiate these practices, as they turn the practice of science into a marketing exercise. Deft prose that makes one’s work look rigorous and deep when it is so fundamentally wrong and shallow is in no sense commendable.

In summary, this is what is wrong with Assembly Theory, why it needs fixing, and why the authors should stop promoting it:

  • The authors take criticism in the wrong spirit, doubling down on their false claims instead of course-correcting.
  • The theory and papers lack appropriate control experiments and introduce other people’s work as de novo ideas without attribution. Had they performed any control experiments, they would have found that they didn’t need any extra information to classify their compounds, or any new measure for that matter, as they could have used almost any other measure of complexity that already counts copies (from Huffman to RLE, LZW, you name it).
  • They have blatantly misappropriated concepts and ideas from others, knowingly and without crediting anyone.
  • They have created a pseudo-problem for which they have manufactured a pseudo-solution.
  • Their theory is inconsistent with their method.
  • They have chosen to take a marketing approach to science.
  • All the eight fallacies below, from creating a strawman fallacy against algorithmic information to (reinventing, misappropriating) and embracing an algorithm that does not do what they say it does.

UPDATE (Friday 5 May 2023): In a recent popular article on Quanta by the science writer Philip Ball (which we won’t link here because we don’t wish to draw what we think is undeserved attention to this theory), the authors of Assembly Theory seem to suggest that the idea of considering the entire history of how entities come to be is original to Assembly Theory (AT). This is, again, incorrect; this idea was explored in the 1980s and was Charles Bennett’s, one of the most outstanding computer scientists and complexity theorists. Roughly, Bennett’s logical depth measures the computational time (number of steps) required to compute an observed structure. This is “the number of steps in the deductive or causal path connecting a thing with its plausible origin”.

Bennett’s motivation was exactly that of Cronin, Walker, and their groups, but his work predated theirs by about 35 years. He was interested in how complex structures evolve and emerge from the large pool of possible (random) combinations, which is also the main subject of interest in Algorithmic Information Theory, which has resource-bounded measures of which AT is properly speaking a weak version, not only because it is computable but because it is trivial, as proven in our paper and this blog post. The authors of Assembly Theory seem to keep jumping from one unsubstantiated claim to another. Unfortunately, most of the people interviewed about Assembly Theory in the Quanta article who were positive are too close to Sara Walker (co-author of AT) to be considered entirely objective (with one of them being one of Walker’s most prolific co-authors), and should not have been chosen to comment as if they were neutral. Either the journalist was misled, or it failed journalistic principles.

Also, we found it unfortunate that the mistaken idea that Kolmogorov (algorithmic) complexity is too abstract or theoretical to be applied was put forward again. And the claim that Kolmogorov complexity ‘requires a device’ (which is no different from AT requiring a computer algorithm to be instantiated) relates to the fallacy below concerning what appear to be some ‘mystical’ properties that the authors attribute to AT).

Algorithmic Information Theory (AIT), or Kolmogorov complexity (which is only an index of AIT), has been applied for almost 80 years and makes all compression algorithms used daily for digital communication possible. It has also found applications in biology, genetics, and molecular biology. Yet one does not even need Kolmogorov complexity to prove Assembly Theory incorrect because it does not do what it says it does, and what it does, it does no better than almost any other control measure of complexity. It only takes one of the simplest possible algorithms known in computer science to prove Assembly Theory redundant since the Huffman coding scheme and better compression algorithms such as the full LZW and many others, can count copies better than Assembly Theory. Counting copies has been the basis of all old statistical lossless compression algorithms since the 1960s. It has been used (and sometimes abused) widely in the life sciences and complexity to characterise aspects of life. Nothing theoretical or abstract makes such applications impossible, though this is yet another common fallacy parroted with high frequency. —

Original Post:

We have identified at least eight significant fallacies in the rebuttal by the proponents of Assembly Theory to our paper criticising the theory, available at https://arxiv.org/abs/2210.00901:

In a recent blog post (https://colemathis.github.io/blog/2022-10-25-SalientMisunderstandings), one of the leading authors of a paper on Assembly Theory suggested that our criticism of Assembly Theory was based on a misunderstanding. At the end of this response, we have included screenshots of this rebuttal to our critique for the record.

By the time you reach the end of this reply, you will have learned how the main results of Assembly Theory can be reproduced, and even outperformed, by some of the simplest algorithms known to computer science, algorithms that were (correctly) designed to do exactly the same that the proponents of Assembly Theory set out to do. We could reproduce all of the results and figures from their original paper, thus demonstrating that Assembly Theory does not add anything new to the decades-old discussion about life. You can go directly to the MAIN RESULT section below should you want to skip the long list of fallacies and cut to the chase (thus focusing exclusively on the empirical demonstration and skipping most of the rest of the foundational issues).

Fallacy 1: Assembly Theory vs AIT

According to the authors’ rebuttal, “[We] contrasted Assembly Theory with measures common in Algorithmic Information Theory (AIT)” and ‘AIT has not considered number of copies’.

This is among the most troubling statements in their reply as it shows the degree of misunderstanding. The number of copies is among the most basic aspects AIT would cover and is the first feature that any simple statistical compression algorithm would look for, so the statement is false and makes no sense.

Furthermore, in our critique, we specifically covered measures of classical information and coding theory unrelated to AIT, which they managed to disregard or distract the reader’s attention from. We showed that their measure was fundamentally and methodologically suboptimal under AIT, under classical Shannon information theory, and under basic traditional statistics and common sense. As discussed in this reply, Assembly Theory and its proponents’ rebuttal of our critique of it mislead the reader into believing that the core of our criticism is predicated upon AIT or Turing machines — an example of a fallacy of origins.

AIT plays little to no role in comparing Assembly Theory with other coding algorithms. As discussed under Fallacies 2 and 4, Assembly Theory proposes a measure that performs poorly in comparison to certain simple coding algorithms introduced in the 1950s and 1960s. These simple coding algorithms are based on entropy principles and traditional statistics. Yet, the authors make unsubstantiated and disproportionate claims about their work in papers and on social media.

This type of fallacious argument continues to appear in the text of the rebuttal to our critique, which suggests a lack of formal knowledge of the mathematics underpinning statistics, information theory, and the theory of computation; or else represents a vicious cycle in which the authors have been unwilling to recognise that they have seriously overstated their case.

To try to distinguish AIT from Assembly Theory in hopes of explaining why our paper’s theoretical results also do not serve as a critique, their text keeps mischaracterising the advantages and challenges of AIT as well as attributing false mathematical properties to AIT, for example, those to be discussed under Fallacies 2, 5, and 6 below.

One of the many issues we pointed out was that their molecular assembly index would fail at identifying as a copy any variant, no matter how negligible the degree of variation, e.g., resulting from DNA methylation. This means the index would need to be tweaked to identify small variations in a copy, meaning it would no longer be agnostic. For example, even linear transformations (e.g., change of scale, reflection, or rotation) would not be picked up by Assembly Theory’s simplistic method for counting identical copies, from which complexity theory largely moved on decades ago. Given that one cannot anticipate all possible transformations, perturbations, noise, or interactions with other systems to which an object may be susceptible, it is necessary to have recourse to more robust measures. These measures will typically be ultimately uncomputable or semi-computable, because they will look for all these non-predictable changes, large or small. So indeed, there is a compromise to be made, yet that they are uncomputable or semi-computable does not mean we have to abandon them in favour of trivial methods or that such measures and approaches cannot be estimated or partially implemented.

But if it comes to testing trivial algorithms as Assembly Theory proposes, algorithms similar as RLE, LZ, and Huffman introduced in the 1960s are special-purpose coding methods that were designed to count copies in data and have been proven to be optimal at counting copies and minimising the number of steps to reproduce an object, unlike the ill-defined Assembly Theory indexes.

Below, we compared Assembly Theory (AT) and its molecular assembly (MA) index against these and other more sophisticated algorithms, showing that neither AT nor MA offer any particular advantage and are in fact suboptimal, both theoretically and in practice, at separating living from non-living systems, using their own data and taking at face value their own results.

Figure taken from our paper (v2) showing how algorithms like Huffman coding or LZ77 do what the authors meant to do (count copies) but failed (later you can also see how Huffman coding performs comparably or better than AT at classifying their own molecules without recourse to any structural data). This is a classical word problem in mathematics, often a first-year course problem in a computer science degree that can be solved with a very simple algorithm like LZ77 or Huffman coding as implemented by a finite automaton (the authors mock Turing machines because they think it is too simple, but their measure runs on a strictly weak version of a Turing machine, a finite automaton, and assumes life can be defined by the processes of life that behave that way). Sub figure C was taken from https://codeconfessions.substack.com/p/lz77-is-all-you-need as a simple Google search retrieves dozens of examples of people showing how LZ77 on the word abracadabra and from which a dictionary-based graph like the one in B (optimal) or A (suboptimal from AT) can be derived. In other words, neither the measure, nor the data or the application can be considered new or making any contribution.

When we say that AT and MA are suboptimal compared to Huffman or LZW, we don’t mean that we expect Assembly Theory to be an optimal compression algorithm (as the authors pretended we were suggesting, in a straw-man attack). LZW or Huffman coding are not an optimal statistical compressor, but they are optimal at doing what Assembly Theory claimed it was doing. This is another point the authors seem to get wrong, naively or viciously repeating ad infinitum that AT and MA are not compression algorithms, hoping that such a claim would make them immune to this criticism.

In no way do we expect Assembly Theory to be like AIT in attempting to implement optimal lossless compression. In other words, the above (and the rest of the paper and post) compares Assembly Theory to one of the most basic coding algorithms for counting copies, a method every compression algorithm has taken for granted since the 1960s but nothing else. The bar is thus quite low, to begin with.

Given that the authors seem to take ‘compression’ as a synonym of AIT, we have decided to substitute the term ‘compression’ for ‘coding’ when appropriate (in most cases) in the new version of our paper (https://arxiv.org/abs/2210.00901), so the authors, and readers, know that we are talking about the properties attributed to AT and MA in other algorithms, regardless of whether these algorithms are seen or have been used in the field of compression.

Ultimately, their (molecular) assembly index (MA) is an upper bound of the algorithmic complexity of the objects it measures, including molecules, notwithstanding the scepticism of the proponents of Assembly Theory. Hence their MA is, properly speaking, an estimation of AIT, even if it is a basic or suboptimal one compared to other available algorithms.

Fallacy 2: ‘We are not a compression algorithm,’ so Assembly Theory is immune from any criticism that may be levelled at the use of compression algorithms

Interestingly, in the view of computer science, Assembly Theory’s molecular assembly index falls into the category of a compression algorithm for all intents and purposes, to the possible consternation of its proponents. This is because their algorithm looks for statistical repetitions (or copies, as they call them), which is what characterises any basic statistical compression algorithm.

Compression is a form of coding. Even if the authors fail to recognise it or name it as such, their algorithm is, for all technical purposes, a suboptimal version of a coding algorithm that has been used in compression for decades. Even if they only wanted to capture (physical) ‘copies’ in data or a process, which is exactly what algorithms like RLE and Huffman do, and confine themselves to doing, albeit optimally, their rebuttal of our critique fails to recognise that what they have proposed is a limited special case of an RLE-Huffman coding algorithm, which means their paper introduces a simpler version of what was already considered one of the simplest coding algorithms in computer science.

By ‘simple’, we mean less optimal and weaker at what it is designed to do, meaning that it may miss certain very basic types of ‘copies’ that, for some reason, the assembly index may disregard as nested in a pathway, hence not even properly counting (physical) ‘copies’ — which the Huffman Coding algorithm does effectively, outperforming AT in practice too (see figure below).

Fallacy 3: Assembly theory is the first (experimental) application of biosignatures to distinguish life from non-life

The claim to be the first to have done so is misleading. The entire literature from the complexity theory community is about trying to find the properties of living systems. The community has been working on identifying properties of living systems that can be captured with and by complexity indexes for decades, perhaps since the concept of entropy itself was formulated. Here is one from us published a decade before Assembly Theory: https://onlinelibrary.wiley.com/doi/10.1002/cplx.20388. The problem has even inspired models of computation, such as membrane computing and P systems, as introduced in the 1990s, that exploit nested modularity.

We could also not find any evidence in favour of the claim in the alleged experimental nature of their index, given that all other measures could separate the molecules as they did without any special experimental data, mostly based on molecular nomenclature. Thus, the defence that their claims about their measure are experimentally validated does not make sense. The agnostic algorithms that we tested and that should have been controlling experiments in their original exploration take the same input from their own data source and produce the same output (same separation of classes) or better.

We have updated our paper online (https://arxiv.org/abs/2210.00901) to cover all their results reproduced by using measures introduced in the 1960s, showing how all other measures produce the same results or even outperform the assembly index.

Actually, we had already reported that nomenclature could drive most of the complexity measures (especially simple ones like AT and MA) into separating living from non-living molecules, which seems to be what the authors of Assembly Theory replicated years later. For example, in this paper published in 2018, we showed how organic and inorganic molecules could be classified using existing complexity indexes of different flavors, based on basic coding, compression, and AIT (a preprint is available here), predating the Assembly Theory indexes by four years.

In 2018, before Assembly Theory was introduced, we showed, in this paper, that complexity indexes could separate organic from inorganic molecules/compounds.
In 2018, in the same paper, we showed that repetitions in nomenclature would drive some complexity indexes and that some measures would pick up the structural properties of these molecules/compounds. However, the authors of Assembly Theory disregarded the literature. They rehashed the work in complexity science and information theory done in the last 60 years that they failed to cite it correctly (even when told to do so before their first publication). They have attracted much attention with an extensive marketing campaign and PR engine that most labs and less social-media-oriented researchers cannot have access to or spare resources for.

We are convinced that it is impossible to define life by looking solely at the structure of an agent’s individual or intrinsic material process in such a trivial manner and without considering its relationship and exchange with its environment, as we explored here, for example, or here, where, by the way, we explained how evolution might create modularity (something the author of the Quanta article on Assembly Theory says, wishfully, that perhaps Assembly Theory could do).

What differentiates a living organism from a crystal is not the nested structure (which can be very similar) but the way the living system interacts with its environment and how it extracts or mirrors its environment’s complexity to its own advantage. So the fact that Assembly Theory pretends to characterise life by counting the number of identical constituents it is made of does not make scientific sense and may also be why Assembly Theory suggests beer is the most alive of the living systems that the authors considered, including yeast.

Fallacy 4: Surprise at the correlation

The authors say they were surprised and found it interesting that the tested compression algorithms, introduced in the 1950s, produce similar or better results than Assembly Theory, as we reported in https://arxiv.org/abs/2210.00901. This should not be a surprise, as those algorithms implement an optimal version of what the authors meant to implement in the first place, with the results conforming with the expectation implicit in the formal and informal specifications.

In this case, to effectively counter the counterargument would be to argue that although theoretically and empirically relevant from the statistical perspective, the correlation is not obtained due to structural similarities (i.e., that the assembly index is a particular type of coding algorithm) within the measure itself. However, the latter is obviously false. Furthermore, if a statistically significant classification task is being performed with equal or greater capacity than a measurement algorithm and method, then this fact per se would require a further explanation — as to why an equal or superior performance should be disregarded. This would entail exposing the measures’ foundational structural characteristics, bringing the reader face-to-face with the other fallacies in the text.

The authors must now explain how we could reproduce their main result with every other measure. They would have toned down their claims if they had performed basic control experiments. Neither the foundational theory nor the methods of Assembly Theory offer anything not explored decades ago with these other indexes that could separate organic and inorganic compounds just as MA and AT did.

Fallacy 5: Assembly Theory vs Turing machines (or computable processes)

The authors wrote:

“They have not demonstrated how those algorithms would manifest in the absence of a Turing machine, how those algorithms could result in chemical synthesis, or the implications of their claims for life detection. Their calculations do not have any bearing on the life detection claims made in Marshall et. al. 2021, or the other peer-reviewed claims of Assembly Theory. Despite the alternative complexity measures discussed, there are no other published agnostic methods for distinguishing the molecular artifacts of living and non-living systems which have been tested experimentally.”

Coming back again to the same straw-man fallacy, they seem to conflate a Turing machine with simplicity and proceed to disparage the model; they do not realise that their index is a basic coding scheme widely used in compression since the 1960s even if they claim that they do not wish to compress (which they effectively do), and thus is also related to algorithmic complexity as an upper bound. They proceeded to ask “how those algorithms would manifest in the absence of a Turing machine,” misconstruing our arguments and dismissing the results, saying they were ‘surprised’ by them. We could not make sense of this statement.

Later on, they say: “MA is grounded in the physics of how molecules are built in reality, not in the abstracted concept of Turing machines.” We could not make any sense of this either. We can only assume that they think we are suggesting that AIT implies that nature operates as a Turing machine. This is an incorrect implication. What AIT does imply is that a measure of algorithmic complexity captures computable (and statistical, as they are a subset) features of a process. There is nothing controversial about this. Science operates on the same assumption, seeking mechanistic causes for natural phenomena. If the implication is that because AIT is typically defined in terms of Turing machines (as an oversimplification of the concept of an algorithm), then they are implying that their assembly index is assuming nature to operate as an even simpler and much sillier automaton, given that the assembly index can be defined in terms of, and executed by, a bounded automaton (an automaton more basic, equally mechanistic, and ‘more simplistic’ than a Turing machine). However, we are not even invoking any Turing-machines-argument. The use of AIT is only to support the logical arguments and a small part of the demonstration of the many issues undermining Assembly Theory.

Algorithms such as LZW, RLE and Huffman do not make any particular ontological commitments. They can be implemented as finite automata, just as the assembly index does; they do not require Turing machines. This, again, shows a lack of understanding on the part of the authors of basic concepts in computer science and complexity theory. In other words, if we were to construct a hierarchy of simplicity, with the simpler machines being inferior, their assembly index would occupy a very lowly position in the food chain of automata theory, counting as one of the simplest algorithms possible that does not even require the power of sophisticated automata like Turing machines or a general-purpose algorithm.

The claim that a Turing machine is an abstract model unable to capture the subtleties of their measure defies logic and comprehension because their measures can run on an automaton significantly simpler than a Turing machine; it does not even require the power of a universal Turing machine. And ultimately, we could not find their measure to be grounded in physics or chemistry, despite their suggestion that this is what makes their measure special. For example, when they say that “the goal of assembly theory is to develop a new understanding of evolved matter that accounts for causal history in terms of what physical operations are possible and have a clear interpretation when measured in the laboratory” and “assembly spaces for molecules must be constructed from bonds because the bond formation is what the physical environment must cause to assemble molecules,” they disclose the limitations of Assembly Theory, because it cannot handle other more intricate environmental (or biological) catalytic processes that can increase the odds of a particular molecule being produced as a by-product, while other more capable compression methods can. What they do accept is that their algorithm counts copies, and as such, their algorithm can run on a very simple automaton of strictly less power than a Turing machine.

This misunderstanding is blatantly evinced in the rebuttal’s passage in which they call the process of generating a stochastically random string an “algorithm.” The authors fail to distinguish between the class of computable processes and the class of the algorithms run with access to an oracle (in this case, access to a stochastic source). Then, to construct a counterexample against our methods, they implicitly assume that the generative process of the string belongs to the latter class while the coding/compression processes for the string belong to the former class. Despite these basic mistakes, the authors later argue that this is one of the reasons that AIT fails to capture their notion of “complete randomness”, with Assembly Theory being designed to do just that. These mistakes suggest an en passant reading of the theoretical computer science and complex systems science literature (see also Fallacy 6). Similarly, their rebuttal claims that our results cannot handle the assembly process as their method can. However, their oversimplified method that generates the assembly index is feasibly computable (and advertised as such by the authors) and can easily be reproduced or simulated by decades-old compression algorithms, let alone by other more powerful computable processes.

In most of our criticism, the use of a Turing machine is irrelevant, regardless of what the authors may think of Turing machines. Their assembly index, RLE, and the Huffman coding do not require any description or involvement of a Turing machine other than the fact that they can all be executed on a Turing machine. This holds because our empirical results do not require AIT (see Fallacies 2 and 4), and our theoretical results do not require Turing machines.

Note that even if completely different from an abstract or physically implemented Turing machine, a physical process can be either computable, capable of universal computation, or both. They are trying to depict the proofs against their methods as if our position was that physical or chemical processes are Turing machines, which makes no sense (here, they seem to be employing a type of clichéd-thinking fallacy). Moreover, they ignore the state-of-the-art ongoing advances in complexity science on hybrid physical processes, that is, processes that are partially computable and partially stochastic.

The authors also fail to see that any recursive algorithm (like theirs) is equivalent to a computer program running on a Turing machine or to a Turing machine itself, including the methods of Assembly Theory, and thus the use of algorithms or Turing machines to make a point is irrelevant. The only option for making sense of their argument is to assume they believe that their Turing-computable algorithm can capture non-Turing computable processes by some mystical or magical power.

Fallacy 6: Assembly index has certain “magical” properties validated experimentally, including solving a problem that it was demonstrated decades ago could not be solved by the likes of it.

The authors claim that if we do not create a molecule that the assembly index fails to characterise we cannot disprove their methods. In addition to being an instance of an appeal to ignorance fallacy, this misrepresents the core of our arguments by ignoring what our results imply. As also discussed under Fallacies 2 and 4, we showed that the assembly measure could be replaced by simple statistical compression measures that do a similar or a better job both of capturing their intended features and classifying the data (the alleged biosignatures) while also offering better (optimal) foundations. We have also proposed, tested, and proven to be better than their index computable approximations to semi-computable measures. Even the computable versions are better than Assembly Theory, both in principle and practice.

Furthermore, they presented a compound as a counter-example that they modestly called Marshallarane (after the lead author of their paper), defined as “[producing] a molecule which contains one of each of the elements in the periodic table.” This is an instance of a combined straw-man and false analogy fallacy, in which an oversimplified explanatory or illustrative example that is supposed to elucidate the target argument is employed to concentrate the discussion around that oversimplified example.

If the authors find that not saying anything about a compound like the ‘Marshallarane’ algorithm (which I’d rather call Arrogantium or Gaslightium) is an advantage, then RLE and Huffman, and indeed any basic Shannon Entropy-based algorithm qualify equally since they wouldn’t be able to find any copy or repetition, as we explain in depth in this paper. In fact, RLE and Huffman coding schemes are limited and among the first and simplest forms of coding schemes to ‘count copies’. This would be different from algorithmic complexity, but it means that we do not need algorithmic complexity to perform like MA, so there’s no need to invoke algorithmic complexity if we can replace Assembly Theory and its molecular assembly with simplistic algorithms such as RLE or Huffman (as introduced in the 1960s) that appear to outperform Assembly Theory itself.

This strategy fails to address our criticism because such a counterexample of building a molecule first poses a contradiction internal to their own proposition. They have provided a Turing-computable process to build their new compound since, according to the authors, “describing a molecule is not the same as causing the physical production of that molecule. Easily describing a graph that represents a molecule is not the same as easily synthesising the real molecule in chemistry or biology”. Yet, their algorithm requires a computable representation that is no more special than any other computable representation of a process. In addition to possibly being an instance of a clichéd-thinking fallacy, here again, given the use in the literature of the term ‘descriptive complexity’ to refer to algorithmic complexity, this is in contradiction to their own claim that Turing machines are not an appropriate abstract model — because it is exactly the mechanistic nature of processes that a Turing machine would be able to emulate or simulate, a process which lies at the heart of AIT. This passage shows a total misunderstanding of AIT, Turing machines, and internal logical coherence (see also Fallacy 5). Our own research on Algorithmic Information Dynamics is concerned with causal model discovery and analysis based on the principles of AIT. Still, the authors present AIT as oblivious to causality and advance their oversimplified, weak, and suboptimal algorithm as able to capture the subtleties of the physical world.

Such an argument also fails because it distorts or omits, in a straw-man manner, the crucial part of our theoretical results that there are objects or events (e.g., a molecule) that satisfy their own statistical criteria for calculating statistical significance and distinguish pure stochastic randomness from constrained (or biased) assembling processes. In other words, there is a computable process that results from a fair-coin-toss stochastically random outcome that satisfies their own statistical criteria that one may employ to check whether or not the resulting molecule sample frequency is statistically significant; and that satisfies their own mathematical criteria for distinguishing random events (in their own words, those with a “copy number of 1”) from non-random events (in their own words, “those with the repeated formation of identical copies with sufficiently high MA”).

The authors indeed appear to suggest that their index has some “magical” properties and is the only measure that can capture and tell apart physical or chemical processes from living ones. For example, their rebuttal of our critique employs the argument that it can tell apart physical or chemical processes from living ones because it “handles” the problem of randomness differently from AIT. Moreover, when they say that “using compression alone, we cannot distinguish between complete randomness and high algorithmic information,” such a claim already contradicts, even in an at-first-glance reading, the fact that the assembly index can be employed as an oversimplified compression process. In any event, the difference between Assembly Theory and AIT certainly and trivially cannot lie in how randomness is handled because Assembly Theory does not handle randomness at all, as it is in sheer contradiction to what is mathematically defined as randomness.

The formal concept of randomness was only established in the literature of mathematics in the wake of a long series of inadequate definitions and open problems exercising the ingenuity of mathematicians, especially in the last century, decades-old problems which (inadvertently or not) already include the statistical and compression method of Assembly Theory as one of the proven cases in which randomness fails to be mathematically characterised. Randomness occurs when it passes every formal-theoretic statistical test one may possibly devise to check whether or not it is somehow “biased” (that is, more formally, that it has some distinctive property or pattern preserved at the measure-theoretic asymptotic limit). However, there are statistical tests (as shown in our paper) for which an event satisfies Assembly Theory’s criteria for “randomness” but does not pass these latter tests. In other words, Assembly Theory fails even in the case of the most intuitive notion of randomness, so that there is an object for which its subsequent constituents are less predictable (or random) according to Assembly Theory than they actually are.

Contrary to their claims, randomness is synonymous with maximum algorithmic information. The real scientific debate that the authors fail to grasp — and is very unlike what they propose as research — has to do with complexity measures for complex systems with intertwinements of both computable and stochastic dynamics, as discussed under Fallacy 5, systems which the results in our critique already showed Assembly Theory failed to measure even as well as other well-known compression methods.

Fallacy 7: “Many groups around the world are using our [assembly] index”

An instance of the bandwagon fallacy. The authors claim that several groups around the world are working on Assembly Theory. If this is the case, it does not validate our criticism but makes it more relevant. The fact that the authors have published their ideas in high-impact journals also corroborates (if only anecdotally) the ongoing and urgent concern in scientific circles about current scientific practice, how biased the peer review process is in its tendency to value social and symbolic capital; how the kind of behaviour once considered inappropriate is now rewarded by social network dynamics impacting scientific dissemination, and how the fancy university titles of corresponding authors may play a role in dissemination, to the detriment of science.

In point of fact, today, there are definitely more groups working in information theory and AIT (which Cronin calls ‘a scam’, see his Twitter post below) than there are groups working on Assembly Theory. And as of today, despite the high media profile of Leroy Cronin, methods based on AIT (a field in which one of his collaborators, Sara Walker, has been productive), including our own, have way more citations and are used by many more groups than Assembly Theory.

That a small army of misled, exploited, and underpaid postdocs in an over-funded lab led by a celebrity scientist who is highly active in misleading people on social media has managed to make other researchers follow their lead is not very surprising and represents some of the weaknesses of the academic system. But it is our duty to inform less well-informed researchers that they may have been misled by severely unsubstantiated claims, naively or viciously. If their work had been ignored, we would not have invested so much time and effort debunking it.

Fallacy 8: “100 molecules only”

To turn now to the claim that we only used 100 molecules and reached conclusions that reveal a misunderstanding of what they did. In their supplemental information, they claim that the 100 molecules in Fig. 2 constitute the main test for MA to establish the chemical space, based upon which they can distinguish biological samples in attempting to detect life. These molecules are, therefore, the same ones they used and defended the use of when confronted with reviewers who pointed out the weaknesses of their paper. The authors mislead the reader by claiming that the 100-molecule experiment has little or nothing to do with their main claims and that we have, therefore, misunderstood their methods and results. This is false; their own reviewers were concerned that their claims were entirely based on tuning their measure over these 100 molecules. We simply replicated their experiment with proper controls and found that other computable statistical measures (therefore not only AIT but classical information coding measures) were equivalent to or better than Assembly Theory.

It does not follow that our empirical findings, shown in Fig. 2 of the first version of our paper, do not disprove their Fig. 4 findings because, according to the authors, Fig. 2 was how they ‘calibrated’ their index and set the chemical space. However, we have updated our paper online to cover Fig. 4, too, showing how all other measures reproduce the same results or even outperform the assembly index so that now each and every one of their results and figures have been covered.

Their paper was not reproducible, which is one of the reasons we hadn’t produced Fig. 4. So, we have taken as given the values of their molecular assembly in their own plot. Even assuming their results, they are inferior compared to those obtained in all control experiments. Every other measure tested produced similar or better results, with some even effortlessly outperforming their index.

Notice that they had claimed that the central hypothesis and framework for MA computation was built upon the data and validation, whose results are shown in Figs. 2 and 3, on the 100 molecules we initially used in our critique. If that had failed, their first hypothesis would have failed on its own terms. So, the reference to Fig. 4 was only to distract readers from the reported issues because the original 100-molecule experiment was the basis for their final results.

They used the 100 molecules to set the chemical space and validated it with the larger database of molecules. They claimed multiple times that complex real biological samples like e.coli extract or yeast extract are just a complex mixture of some of these molecules from which they built the chemical space (i.e., the 100 molecules).

MAIN EMPIRICAL RESULT

Finally, we provide the main figure showing that other simple and more sophisticated measures reproduce the molecular assembly results at separating biological molecules using the same data that their index uses while performing either comparably or actually outperforming Assembly Theory.

It reproduces their results in full using traditional measures, some of which the main author of Assembly Theory has called ‘a scam’, showing that their assembly index does not add anything new, their algorithm does not display ‘special properties’ to count ‘physical’ copies, and that, in fact, some measures even outperform it when taking their results at face value (as we were unable to reproduce them). It is thus fatal to the said theory.

The new version of our paper, available online, includes 2 figures (figure 3 and 4) that incorporates all the molecules/compounds in the original Assembly Theory paper first to show it classifies a small set of molecules chosen by the AT authors, and then tested on an even smaller set of less than 20 molecules (this figure). The authors of AT have falsely claimed on multiple occasions that we did not replicate all their results using decades-old algorithms (or outperform them using simple and more sophisticated ones). This figure shows that what the main author has called a scam (‘algorithmic complexity,’ see image below) and even simpler methods can reproduce the same results as their allegedly novel and ground-breaking Assembly Theory.
Table taken from our paper showing how other measures trivially outperform AT and MA. The authors of Assembly Theory have never done or offered any control experiments comparing their work or indexes to other measures. The above shows that other complexity measures perform comparably or better and that no claim to ‘physical’ processes is justified because with and without ‘structural data’ their results are reproducible with trivial algorithms that have been known for decades.

In summary

Readers should judge for themselves by reading the papers and the rebuttals whether or not their fallacious arguments can be worked around. Their response to our critiques have not addressed our concerns. It tries, hopefully unintentionally, to distract the reader from the results with statements about their work’s empirical assumptions or putative theoretical foundations. Contrary to their claim that “[our] theoretical and empirical claims do not undermine previously published work on Assembly Theory or its application to life detection,” our arguments seriously undermine Assembly Theory, indeed seem fatal to it.

The assembly index can be replaced by simple statistical coding schemes that are truly agnostic, not being designed with any particular purpose in mind, and do a similar or better job at capturing both the features that Assembly Theory is meant to capture and, in practice, classifying the data (the alleged biosignatures) while also offering better foundations (optimality at counting copies) even without having recourse to more advanced methods such as AIT (which also reproduce their results, e.g. 1D-BDM).

The animus of the senior author toward one of the core areas of computer science suggests a lack of understanding of some of the basics of computer science and complexity theory (some replies to his Tweet even told him that his own index was a weak version of AIT, to which the author never replied). Coming from a University Professor, dismissing algorithmic complexity measures as ‘a scam’ (see the image from Twitter below), with no reasons given even when requested to elaborate, is dismaying. Especially so since one of the finest works of one of the original paper’s main co-authors happens to include serious work in the area of algorithmic probability (Sara Walker paper). The authors fail to realise that for all foundational and practical purposes, Assembly Theory and its methods are a special weak case of AIT and would only work because of the principles of algorithmic complexity. AIT is the theory that underpins their Assembly Theory as an (unintended and suboptimal) upper bound of algorithmic complexity.

On behalf of the authors,

Dr. Hector Zenil
Machine Learning Group
Department of Chemical Engineering and Biotechnology
University of Cambridge, U.K.

The following are the honest answers to the authors’ FAQs that they have published on their website (http://molecular-assembly.com/Q%26A/):

Second Q&A from the original authors led by L.Cronin

Honest answer: It is a simplistic index of pseudo-statistical randomness that counts ‘copies’ of elements in an object. Surprisingly, the authors claim to capture key physical processes underlying living systems. Indeed, according to the authors, counting copies of an object unveils whether it is alive or not. The idea that such a simplistic ‘theory of life’ can characterise a complex phenomenon like life is too naive.

In our opinion, no definition of life can disregard the agent’s environment, as life is about interaction with the environment. A simplistic intrinsic measure like the assembly index would astonish any past and future scientist if it were actually capable of the feats that their authors believe it capable of (the senior author even claims to be able to detect extraterrestrial life). However, we have shown that it does not perform better than other simple coding measures introduced in the 1960s (and better defined) and that their measure is ill-defined and suboptimal at counting copies (see https://arxiv.org/abs/2210.00901).

We think that the key to characterising life lies in the way living systems interact with their environments and that any measure that does not consider this state-dependent variable, which is the basis of, for example, Darwinian evolution, will fail. We have published some work in this area featuring content that Lee Cronin considers ‘a scam’ (see below), one of which was co-authored with one of the senior authors of the Assembly Theory paper with Cronin, Sara Walker (https://www.nature.com/articles/s41598-017-00810-8).

Third Q&A from the original authors led by L. Cronin

Honest answer: Despite the apparent fake rigour of the answer, the assembly index is a weak (and unattributed) version of a simplistic coding algorithm widely used in compression and introduced in the 1960s, mostly regarded today as a toy algorithm. As such, and given that all compression algorithms are measures of algorithmic complexity, it is an algorithmic complexity measure. To the authors’ misfortune, their assembly index is the most simplistic statistical compression algorithm known to computer scientists today based on Huffman’s dictionary-based coding leading to LZW compression.

Their original paper is full of, perhaps unintended, fake rigour. The authors even included a proof of computability of their ‘counting-copies algorithm’, which nobody would doubt was trivially computable. Nobody has proven that algorithms like RLE, LZ or Huffman coding are computable because they are trivial in this regard. Even assuming their proof is right, it was unnecessary. These algorithms can be implemented in finite automata and are, therefore, trivially computable. What they have shown is that they seem to be in some sort of straw-man crusade against AIT in an apparently desperate attempt to show how much better they are compared to AIT (without even comparing themselves to other, trivial computable measures).

Fourth Q&A from the original author led by L.Cronin

Honest answer: Yes, their second sentence indicates that their algorithm is trying to optimise for the minimum number of steps after compression by looking at how many copies an object is made of (just as Lempev-Zif/LZ compression does. For an example, see https://www.bbc.co.uk/bitesize/guides/zd88jty/revision/10), but AT and MA do this with extra and unnecessary steps, not giving credit and actually trying to trash the very principles they use.

To their surprise, perhaps, almost everything is related to compression. In AI, for example, the current most successful model based on what are called Transformers (ChatGPT) is being found to perform successfully because it can reduce/compress the high layer dimensionality from large training data sets. Science itself is about compressing observed data into succinct mechanistic explanatory models smaller than the data itself.

Fifth Q&A from the original authors led by L. Cronin

Honest answer: The authors say this has nothing to do with algorithmic complexity, but their answer is almost the definition of algorithmic complexity, and of algorithmic probability, which looks at the likelihood of a process producing other simple processes (including copies). Unfortunately, the assembly index is such a simplistic measure that complexity science left it behind in the 1960s or incorporated it as one of the most basic algorithms that literally any compression and complexity measure takes into account. These days, it is taken for granted and used in exercises for first-year computer science students. Update (14 May 2023):

How to fix Assembly Theory?

One can anticipate and brace for the next move of the group behind AT, the next publicity stunt backed by a well-oiled marketing engine: the announcement that they have created artificial life in their lab, as measured by this simplistic assembly index (which we believe is their original motivation) that classifies even a crystal as an instance of life, and beer as an exemplary instance (according to their own experimental results).

AT is unfixable in some fundamental ways, but the authors have invested so much effort and their own credibility that they are unlikely to back down. In the spirit of constructiveness, and having discussed this with a group of colleagues (extending beyond the authors of our paper and blog post), we can get creative and see how to make AT more relevant. Here we suggest a way to somehow ‘patch up’ AT:

  1. Embrace an optimal method of what Assembly Theory originally wanted to do, which is to count nested copies. It should start with LZ, or any other coding measure approximating algorithmic complexity by compressing means. They can explore weaker versions by relaxing assumptions if they want to generate different interpretations (and duly credit the right people when exploring borrowed ideas , such as their idea of the ‘number of steps’ and ‘causally connected processes’ from Logical Depth). Things like counting the frequency of molecules would already be taken into account, as shown in our own papers and in others.
  2. Drop the claim that ‘physical’ bonds, reactions or processes can only be captured by AT; it makes no sense. ‘Physical’ or not, everything is physical, or nothing is. Whatever material they feed their assembly index with must be symbolic and computable, as their measure is an algorithm that takes a piece of data containing a representation of ‘physical’ copies, as in chemical nomenclature (e.g. InChi) or structural distance matrices.
  3. Drop the assumption that bonds or chemical reactions happen with equal probability. To begin with, this is not the case; depending on the environment, each reaction has a different probability of happening. In any case, algorithmic probability already indicates a simplicity bias, and is encoded in the universal distribution (see this paper based on and motivated by our own extensive work).4. Your measure needs to factor in the influence of the environment to change the probability distribution of how likely specific physical or chemical steps can happen. This is the state-dependency step that we defend above, which no measure can ignore, and is the chemical basis of biological evolution.
  4. Your measure needs to factor in the influence of the environment to change the probability distribution of how likely it is that specific physical or chemical steps can transpire. This is the state-dependency step that we defend above, which no measure can ignore, and is the chemical basis of biological evolution.
  5. Based on an ever-changing environment, any agent (physical/chemical process) would need to adapt, which is the precondition for particular chemical reactions occurring. It is the internal dynamics of this relationship that we know is the hallmark of life. After all, amino acids can readily be found on dead asteroids; not many people would call that, or indeed beer (as AT does), life.

For AT to work, therefore, the agreement among colleagues seems to be that these steps would need to be considered before anyone suggests any ‘validation’ using spectrometry data and putting a complexity number on a molecule to call it a good measure for detecting life. This is not Assembly Theory or what the authors of AT did but what others have been doing in the field, with both breakthroughs and incremental progress in the past and for a long time.

Cited screenshots

The original rebuttal to our paper (https://arxiv.org/abs/2210.00901) from the authors of Assembly Theory (for reference in case it changes or is taken down in the future)

Appendix

Let’s address the — entirely unwarranted — reservations the authors of Assembly Theory (and many more uniformed researchers) seem to have against a classical model of computation. While researchers from Assembly Theory seem to disparage anything related to the foundations of computer science, such as the model of Turing machines and algorithmic complexity, other labs, such as the Sinclair Lab at Harvard, have recently reported surprising experimental epigenetic aging results deeply connected to information and computer sciences. Says Prof. David Sinclair: “We believe it’s a loss of information — a loss in the cell’s ability to read its original DNA so it forgets how to function — in much the same way an old computer may develop corrupted software. I call it the information theory of aging.” https://edition.cnn.com/2023/01/12/health/reversing-aging-scn-wellness/index.html

While we are not embracing a supremacist view of AIT or Turing machines, we have no reason to disparage the Turing machine model. There are very eminent scientists, such as Sydney Brenner (Nobel Prize in Chemistry), who believe not only that nature can be expressed and described by computational systems but that the Turing machine is a fundamental and powerful analogue for biological processes, as Brenner argues in his paper in Nature entitled “Life’s code script” (https://www.nature.com/articles/482461a), ideas that Cronin and his co-authors seem to mock. According to Brenner, DNA is a quintessential example of a Turing machine-like system in nature, at the core of terrestrial biology, underpinning all living systems.

As for the importance of AIT (algorithmic complexity and algorithmic probability) in science, in this video of a panel discussion led by Sir Paul Nurse (Nobel Prize in Physiology or Medicine, awarded by the Karolinska Institute) at the World Science Festival in New York, Marvin Minsky, considered a founding father of AI and one of the most brilliant scientists, expressed his belief that the area was probably the most important human scientific achievement, urging scientists to devote their lives to it. He did this sitting next to Gregory Chaitin, fellow panelist and one of the founders of AIT and algorithmic probability (and thesis advisor to Dr. Zenil, the senior author of the paper critiquing Assembly Theory).

--

--

Dr. Hector Zenil
Dr. Hector Zenil

Written by Dr. Hector Zenil

Associate Professor King’s College London. Former Senior Researcher & Faculty Member @Oxford U., Alan Turing Institute & Chemical Eng & Biotech @Cambridge U.

Responses (4)