Biology of information lectures at College de France

Lectures by Walter Fontana

Life and the Computer: The challenge of a Science of Organization
The Biology of Information – A Dialogue between Computer Science and Biology
1. The 2019/20 rotation of the chair in “Informatics and Computational Sciences” at the Collège de France is meant to emphasize computational biology. The term usually refers to “bioinformatics”—a theory and practice of computation in the service of organizing, searching, and analyzing large data sets. I believe sufficient general attention is given to this area that another course seems unwarranted. Rather, I would like to approach computation from two angles, each distinct from the traditional meaning of computational biology. On the one hand, I want to consider programming language theory as a formal framework (much like good old infinitesimal calculus) in support of modeling complex dynamic systems of biological relevance. On the other, I want to consider computation as a natural, physical phenomenon that biological systems themselves are exhibiting. (If the latter were true, the former would be necessary.)
2. In the case of modeling, I emphasize the search for principles beyond mere description. However, I will also argue that the complexity of biological systems requires a new technology and practice that is predicated on fusing modeling with knowledge representation. The reason is roughly as follows. In situations that are not vitiated by complexity, an assumed understanding of a key aspect of a system typically __precedes__ modeling. Yet, in the case of complex and heterogenous interaction networks, an initial systems-level understanding may not be available even with reasonable knowledge of local interactions. This leads to an inversion—from understanding precedes modeling to __modeling precedes understanding__. This inversion significantly alters the character of models and the practice of modeling.
3. In the case where computation is hypothesized as a natural activity of biological systems, I wish to keep the concept sufficiently vague so as to stimulate thoughts about the nature of computation “in the wild”. The historical definition in terms of partial recursive functions on the integers is not terribly useful for characterizing the mode and purpose of computation as stressed by, for example, concurrency. I hold a dynamic view in which computation refers to __organization__, i.e. the change of structured agents (where change affects both structure and abundance) and the structure of that change, which is to say the causal architecture of a system. That architecture appears itself to be dynamic. The physics perspective adds the requirement that information be embodied to become __accessible__ and __causally effective__, thus emphasizing the processing of __physical__ representations of information and associated costs in sensing, classifying, memorizing, learning, deciding, and adapting. To the extent that creating causal structure is akin to programming, biological development and evolution are two styles of automatic programming.
4. It is a tall order to coherently articulate the viewpoint of computation-as-organization in a manner that impacts bench biology. While the proposed course is motivated by this challenge, it is at best a modest (and idiosyncratic) attempt at stimulating discussion of how the concepts of computation and information might shape biological theory in a way that could be relevant to empirical projects.
5. At this point in time, the following table of contents can only serve as a rough guide. I’m (vastly) overshooting in that each lecture contains much more material than is possible or wise to present in 1:5 hours. I therefore expect considerable tweaking as I prepare the course in detail.
The representation of biological information : Statistical aspects of sequence to structure mappings : The case of RNA
1. The classical framework of evolutionary change is based on two distinct representations of information—genotype and phenotype—related by a mapping known as development. The notion of phenotype refers to the physical, organizational and behavioral expression of an organism. It emphasizes the systemic nature of biological organization. While we lack a theory of organization (or development), many basic concepts and challenges of evolutionary theory can be illustrated using a simple model microcosm: the mapping from RNA sequences into structures through folding. I will discuss concepts, methods, and insights related to the study of statistical properties of this particular mapping.
  1. RNA secondary structure and minimum free energy folding
  2. Shape space covering: all frequent structures occur within a small neighborhood of a randomly chosen sequence; typicality
  3. Neutral networks: the sequences folding into the same minimum free energy structure form a mutationally connected network in sequence space
  4. Topology: phenotype __space__ is induced by development; the problem of evolutionary innovation
  5. Plasticity mirrors variability: the set of minimum free energy structures realizable in the mutational neighborhood of a given sequence correlates strongly with the energetically suboptimal structures of that sequence
  6. Evolution: the Baldwin effect, Waddington’s canalization, evolvability and Kirchner’s concept of facilitated variation
The propagation of biological information: transmission limits (11/12)
1. In this unit I discuss different mechanisms that transmit biological information across generations. Computational models are used to determine the limits of such transmission.
  1. Lancet’s and Segre’s inheritance of pre-Darwinian “compositional genomes”
  2. Inheritance of sequence information: the molecular quasi-species and the error threshold in sequence space; neutrality and the error threshold in phenotype space
  3. Reading, writing, and erasing: the transmission of epigenetic information
  4. The challenge of niche construction
Modeling biological information processing
1. I * Tools for reasoning about molecular interaction networks: Introduction to rule-based approaches in chemistry and protein-protein networks
  1. Transient binding is an important mode of bio-molecular interaction. It is a prequel to many enzymatic reactions and a basic step in the assembly of molecular complexes. Combinatorial aspects of assembly yield counter-intuitive behavior in cellular signaling processes that proceed through transient formation of large complexes (signalosomes).
    1. Equilibrium statistical mechanics of assembly: the generating function of the partition function (with a nod to Joyal’s and Bergeron’s combinatorial species)
    2. The assembly behavior of multivalent scaffolds, rings, homopolymers and heteropolymers; the potential use of these constructs in cellular signaling
    3. Optimizing non-equilibrium assembly under constant chemical potential
    4. Small interaction motifs, their non-equilibrium behavior, and the significance of mechanistic detail
2. II * Tools for reasoning about molecular interaction networks: Small and large use cases in Kappa
  1. The abundance of mRNA transcripts, the localization of proteins, and their post-translational modifications are taken to reflect the state of a biological system. The overwhelming effort at analysis to date has been directed at data originating from such sweeping surveys of system state. Meanwhile, however, detailed mechanistic studies are elucidating the structural and post-translational requirements on protein regions, domains, and residues that enable specific interactions. These data do not directly pertain to system state, but to the processes that __generate__ system state. The many interactions inferred from biochemical, biophysical, and structural analyses are often combined into static networks. Surveying the properties of such networks, while useful, offers only limited insight, since the significance of any specific interaction is determined by the dynamic behavior of all others that co-occur in a given situation.
  2. As we shall argue in the previous lecture, mechanistic models are needed for understanding systems dynamics and making interventions into cellular processes more deliberate. Such models need to be scalable, easy to update, amenable to variation, transparently tied to knowledge representation, and based on a formal foundation conducive to computer-aided reasoning. In other words: A useful model should be a data structure that constitutes a transparent, editable, formal, and executable representation of the facts it rests upon. This is a prescription for replacing a world we don’t understand with a model we don’t understand but that is easier to analyze and experiment with. The challenge is to develop mathematical techniques and a sound software infrastructure for analyzing, visualizing, manipulating, simplifying—in short, reasoning with—models that are like empirical objects. I will lay out an implementation in support of this vision. The platform that I introduce will be deployed hands-on in several parts of the remainder of the course.
    1. Rule-based modeling: graph-rewrite systems, chemistry, and the Kappa language
    2. Simulation semantics: Continuous-time Monte Carlo
    3. Introduction to the Kappa software suite: Simulator (KaSim), static analyzer (KaSa), the Trace Query Language (TQL), the Kappa app (Kappapp.app), visualization
    4. Overview of advanced topics: deterministic coarse-graining and thermodynamic consistency
3. III * Tools for reasoning about molecular interaction networks: The quest for insight; statistical analysis and causality
4. IV * Combinatorial assembly systems and molecular signaling: combinatorial scaffolding
  1. Static depictions of bio-molecular pathways are informal narratives for organizing data. Yet, pathways do not exist as physical circuits like road networks do. Rather, at any given moment, pathways __emerge__ from and are maintained by the many concurrent and changing interactions between the molecular agents that constitute a system. In this lecture I detail the current state of causal analysis in rule-based models.
  2. I will focus on __actual causality__, which is contrasted to __type causality__. Type causality is about generic statements, such as “Smoking causes cancer” or “Printing money causes inflation”; it is forwardlooking and predictive. Actual causality is about recovering the path leading to an event that has already occurred. It is therefore retrospective and specific to an actual history. Whether the faulty breaks of a particular car caused a specific accident is a matter of actual causality (often invoked in court). Actual causality and type causality are intertwined since statistical statements about actual causes might warrant type-level statements of a predictive kind. Causality in Kappa models will be discussed from two complementary viewpoints: via concurrency (as non-independence) and via counterfactual reasoning.
    1. Basic tools from category theory for reasoning about influences between Kappa rules
    2. Potential causality: static rule influence and dynamic flows of influence
    3. The formal concept of __story__: Actual causality as non-independence; reconstructing and compressing the causal past of an event of interest; acquiring story statistics
    4. Lewis-Halpern-Pearl: the counterfactual approach to causality and its implementation in Kappa
    5. Towards a formal notion of __explanation__
5. IV * Combinatorial assembly systems and molecular signaling: statistical mechanics
  1. Thermodynamic consistency imposes constraints on the choice of rate constants, which must guarantee the existence of a thermodynamic equilibrium state, i.e. a state in which all reactions have zero net flux (detailed balance). This implies that rate constants must be consistent with an underlying free energy landscape that assigns an energy content to every molecular species in the model. Constraints imposed by thermodynamic consistency are mostly absent in models of larger signaling systems that operate far from equilibrium at many reactions. This off-equilibrium operation is modeled by assuming many reactions to be irreversible and aggregating fixed concentrations of energy stores, such as ATP, into pseudo rate constants. This approach is justified if the focus is on dynamical phenomena without a quantitative concern for energy transduction. However, if we wish to understand the energetic costs and trade-offs associated with improving the accuracy with which a system can discriminate a “true” signal from a “false” one, thermodynamic considerations become paramount. In this unit, I will sketch the basics of modern non-equilibrium thermodynamics as far as needed to review its application to biological information processing.
    1. Thermodynamics of computation; logical and thermodynamic irreversibility; Landauer principle and Szilard’s demon
    2. Basic notions of non-equilibrium thermodynamics illustrated with a Kappa model of a membranetransporter: entropy production, free-energy storage, and free-energy transduction; the concept of coupling
    3. Hopfield’s kinetic proofreading scheme and the trade-offs between free-energy consumption, error, and time; energetic and kinetic regimes of error correction
    4. Proof-reading networks
Acquiring biological information: Learning in molecular systems?
1. Probabilistic inference provides a language for describing how organisms may learn from and adapt to their environment. The computations needed to implement probabilistic inference often require specific representations, akin to having the suitable data structures in computer programming. Yet it is unclear how such representations can be instantiated in the stochastic and concurrent biochemical machinery found in cells.
2. I will survey attempts at constructing “embodied” information-processing systems, i.e. molecular networks that follow stochastic mass action and implement an abstract computation. Like in the subsymbolic approach to cognition, the mechanistic level of implementation and the abstract descriptions mutually restrict each other and have to be studied together. This is a more speculative unit that seeks to raise the question about the extent to which the molecular machinery of a cell is a substrate that “learns” (both ontogenetically and phylogenetically) like a “deep learning” reinforcement network. If this were the case, systems biology would be profoundly affected. Could there still be a project aimed at providing a molecular explanation of what a cell does?
  1. Deep learning
  2. “Genetic programming” to play the Atari game suite
  3. Katz’s model of molecular real-time probabilistic inference in a changing environment

Seminars

The evolution of cellular individuality by Eric Deeds in 2019
Graph rewriting and Chemistry by Daniel Merkle in 2019
From molecules to systems: the problem of knowledge representation in molecular biology by Jean Krivine in 2019
Easy and hard in the origin of life by Eric Smith in 2019
Thermodynamics of Open Chemical Reaction Networks Theory and Applications by Massimiliano Esposito in 2020
Cells as cognitive creatures by Yarden Katz in 2019
Prediction in immune repertoires by Aleksandra Walczak in 2020
Imaging subcellular dynamics from molecules to multicellular organisms by Tommy Kirchhausen in 2020

notes

index

Biology of information lectures at College de France

Lectures by Walter Fontana

Seminars

Links to this note