joelb123.github.io

Current Opinions of Joel Berendzen

View the Project on GitHub

Towards a Theory of Biology

Now is an exciting time to be a scientist with a quantitative bent working in biology, because a Theory of Biology is emerging in much the same way that the Theory of Physics emerged in the early 20th century. Physics back then was getting broken in interesting ways by data coming from a new technology, X-rays, that smashed matter into bits never encountered before. The theory that emerged from that breakage, quantum mechanics, eventually opened up astonishing applications such as atomic energy and transistors.
Biology today is driven by data arising from the confluence of sequencing, imaging, and computational technologies. Metagenomics, comprehensive RNA surveys, and tissue-atlas data, in particular, are breaking some long-held assumptions about how information flows around and through living things by finding surprising bits of sequence in places they shouldn’t belong. But data by itself is not a theory.

Work at Building Bridges

You may object, dear reader, that there already is a Theory of Biology, which is evolution and was written by Charles Darwin back in 1859. I will counter that what Darwin wrote was a model—a supremely important one—and that the fossil data on which it relied was sparse and insufficient to explore the richness of how Life is interrelated. A working Theory of Biology will certainly start by subsuming subsuming quantitative models where they exist, and evolutionary theory in its quantitative form of molecular phylogeny certainly must be included to anchor the quantum end of the size scale. At the opposite, cosmic, end of the scale, lies another part of biological theory, ecology, with its roots in island ecology.

At both ends, the theory is quantitative and makes testable predictions with several significant figures of accuracy, for example in genome sequencing recapitulating and extending the extensive trait-based data in Bergey’s Manual of Systematic Bacteriology. But yet things are messy, with critical real-world effects to be accounted for, such as RNA-world regulation, emergence of neural processing, and host-biota interactions. Horizontal gene transfer is a particular sore point, because it turns out to be the dominant means of evolution and geochemistry on the planet. It’s going to take a sustained effort for a period that extends beyond my lifetime to build the bridge between the quantum and cosmis scales, making contact with physics, chemistry, and mathematics throughout the span.

Data to Signatures

Before there can be a theory, there has to be a model, and before there is a model, there have to be signatures of the phenomenon you’re studying. A good set of signatures can carry you a great distance into the project in both data collection and understanding. Well-designed projects generally start with identifying a gradient in the phenomenon, such as the famous diversity gradients in ecological diversity observed as a function of latitude. Then, one collects data with replicates, maybe does some dimensionality reduction with PCA or a fancy newer method, and out pops the signal–to–noise ratio for the putative signatures as a function of signal strength. You then sit scratching your head at what the signatures might imply, both when looked at bottom-up from the micro view (sequence) or top-down from the macro view (ecology). If everything aligns to tell a story, then you have your signatures.

Phenomenological Models

Next you’re going to cast a broad net and make histograms of signature frequencies, often in various problem-specific transform spaces. If those histograms look gaussian, they probably arise from a random process and are not particularly interesting. But usually some distribution that you calculate looks mostly like a straight line in a log-log plot. Voila, you’ve just created a phenomenological model featuring a power-law distribution.

I’ve spent a good part of my scientific career working on power-law distributions, and they’re both unavoidable and a bit painful. You need either a lot of orders of magnitude or high accuracy on one or both axes to distinguish them from other distributions such as, say, bi-exponential. Worse still, they are hard to connect to the underlying structure of a problem. Except for a very few special cases (such as a direct connection to electrostatics), there aren’t mechanisms that would lead one to expect a power-law distribution, much less predict the exponent. Power laws frequently have a circular justification, with experimentalists fitting their data to them because they think that theory justifies it, and theorists trying to derive a power law from their structural models because that’s what experiment shows. But with a phenomenological model like a power-law distribution, you can feel that at least you have swept your lack of knowledge into a somewhat-tidy corner along with a lot of other problems, and your model has at least some predictive power, though not usually enough to rule-in or rule-out most mechanisms.

Mechanistic Models

When one makes a bit more progress into a data stream, a key step is to make a model that puts in a bit of the 3-D structure, physics, and chemistry of the system to produce a mechanistic model of the data. The model will necessarily be quite crude at first, but even crude models can have a predictive value and a utility that exceeds that of a phenomenological model. For instance, the model of neurons used in current neural-network machine-learning models reproduce few of the features of spiking neuron data, yet they work well enough to enable feature classification by deep learning. To cover all the bases, most mechanistic models will need to be extended to multiscale with some understanding from the highest resolutions patched in to the big picture on the biggest scales and hopefully limiting the range of acceptable parameters in the model.

You’ll have to revisit the mechanistic models every time new data appears, especially if those data are from independent techniques on a comparable model system. Some really surprising new results will send you back to look at the signatures, maybe all the way back to designing a new experiment. The process of model refinement is never finished, only abandoned.

from the bits of things sticking up above the surface right now. For one, it seems that we have vastly underestimated the extent to which we humans with other biota, and the extent that interaction helps determine what it means to be human.

We Live in a Biofield

While I have faith that a Theory of Biology will help with medical diagnostics and therapeutics, I can’t presume to envision the really revolutionary applications that are years out. But I do see a few structures sticking above the surface right now. For one, it seems that we have vastly underestimated the extent to which humans interact with other biota, and how that interaction helps determine what it means to be human.

My view is that life exists in a biofield of sequences consisting not just genomes from plants, animals, and microbes, but also of an immense number of virus-like particles (VLPs). These VLPs are a medium for exchanging messages useful in adapting to local conditions than about infection, and are more about how to cooperate to solve problems together than about biological warefare. Recent results in marine viral metagenomics have shown that the VLPs are likely the dominant source of carbon in the ocean and, by extension, the main engine of evolution on earth. The biofield generally supports and sustains us, pointing the way to sustaining increased amounts of Life in the local environment the organisms there mutually create.

The biofield is most dynamic at interfaces. For example, at the luminal surface of the intestine, the partial pressure of molecular oxygen goes from half an atmosphere to nearly zero in a distance of 100 microns, creating a gradient in which microaerobes compete for access while our gut-brain connection determines what regions get rewarded with increased blood flow while others get punished with toxic sulfurous chemical warfare agents. The interface appears to extend beyond the gut wall through selective induction of bacteria into macrophages that circulate live-but-passive microbes throughout the body. This process was first observed by the Russian scientist Ilya Metchnikov, who discovered macrophages in work for which he won the 1908 Nobel Prize. Why does the immune system find it advantageous to do this? Nobody knows, but my guess—which is close to Metchnikov’s—is that this bit of the biofield gets circulated to carry out chemistry in specific sites that the host can’t efficiently accomplish by itself, such as remodeling dead tissue at sites of inflammation or synthesizing neurochemicals that signal “eat more” or “you’re happy, do that again soon”. Lots of people have remarked that happiness is contageous in communities of people that are doing good things together. My guess is that the Theory of Biology will one day teach us how to measure and promote that happiness.