Dynamics and Complexity of Computrons

We investigate chaoticity and complexity of a binary general network automata of finite size with external input which we call a computron. As a generalization of cellular automata, computrons can have non-uniform cell rules, non-regular cell connectivity and an external input. We show that any finite-state machine can be represented as a computron and develop two novel set-theoretic concepts: (i) diversity space as a metric space that captures similarity of configurations on a given graph and (ii) basin complexity as a measure of complexity of partitions of the diversity space. We use these concepts to quantify chaoticity of computrons’ dynamics and the complexity of their basins of attraction. The theory is then extended into probabilistic machines where we define fuzzy basin partitioning of recurrent classes and introduce the concept of ergodic decomposition. A case study on 1D cyclic computron is provided with both deterministic and probabilistic versions.


Introduction
Complexity of dynamical systems and that of computational machines are mostly disconnected areas of research. The dynamical systems typically live in metric state spaces and evolve in time by iterating the rules of dynamics. The system is said to be complex if the information required to describe the evolving paths is high, a notion formally defined as Kolmogorov-Sinai entropy. On the other side of the world where computational machines live, the aim is to map a set of inputs to a set of outputs. We define computron as a computational machine that is composed of a finite number of cells placed on vertices of a directional graph. The cells have binary states which are updated according to local transition rules as a function of connected cells' states. Computron is the generalization of a cellular automata where local rules are not necessarily uniform, and graph not required to be a regular grid. Furthermore, a computron can have an external input, in which case it is called driven, otherwise autonomous. The computron's global state is the configuration of its cells' local states on its graph. It performs computation by mapping an initial configuration to an attractor, be it fixed or periodic by iterative dynamics. Such mapping partitions its configuration space.
We attempt to bridge disparate worlds of dynamical systems and computation by embedding computrons in a suitably defined metric space that captures similarity of its configurations. We develop notions of dynamical chaoticity using Lyapunov characterization of the machine's evolution and computational complexity by analyzing the basin partitioning of this proposed metric space. Our work contributes to literature in multiple ways: 1. Computron is introduced as a computational machine which is equivalent to a finite-state machine, but has the novelty that flow of information and processing of information is separated. 2. A new distance measure is introduced for configuration space on automata graphs. It is different than Hamming distance used in the literature since it captures pattern similarity of configurations on the underlying graph as opposed to solely counting mismatch between them.

3.
A new complexity measure is proposed which captures the intrinsic structural complexity of a partitioning of the configuration space, not its randomness. 4. The above concepts are applied to quantify dynamics of generalized graph automata whereas literature is mainly focused on dynamics of cellular automata. 5. The framework is expanded to probabilistic automata using fuzzy basins and recurrent classes which enables us to quantify chaoticity and complexity of probabilistic machines, as well as their susceptibility to noise.
Section 2 provides background and literature review of Lyapunov characterization and entropy of dynamical systems including cellular automata. Section 3 defines the computron model. Section 4 introduces new set-theoretic concepts such as diversity measure and entanglement. Section 5 characterizes the dynamics of computron by embedding the machine into a metric space that captures similarity of configurations. Section 6 expands into probabilistic computation where each cell performs one of many rules probabilistically, driven by a random external input. This is a major step which pulls the formal domain into Markov chains and recurrent classes. Probabilistic computation lends itself to analysis of susceptibility of computrons, including effects of noise. Proofs of Lemmas introduced in various sections are provided in the Appendix A.

Background
In his 'A Philosophical Essay on Probabilities' (1814), Laplace states that knowing the present and evolution laws gives full knowledge of the future. Yet, even if laws are truly known and are truly deterministic, the chaos could prevail, as described by Lorenz (1972): 'Chaos is when the present determines the future, but the approximate present does not approximately determine the future'. The sensitivity of future to present is formalized by Lyapunov characterization. The richness of what future may bring is quantified by Kolmogorov-Sinai entropy:

Lyapunov Characterization
Lyapunov exponents measure the typical rate of exponential divergence of nearby trajectories in a dynamical system. Consider the following continuous-time system: where x ∈ R d specifies the system state, and F, a differentiable function, defines the evolution law. Let x(t) and x (t) be two trajectories that start from close-by initial states x (0) = x(0) + δx(0). The evolution of difference in trajectories, z(t) = x (t) − x(t), can be approximated as a vector in tangent space: and there exists an orthonormal basis {e i } such that for large t, the difference in trajectories can be represented as exponential growth or decay: The Lyapunov exponents {λ i } are independent of the initial state x(0) if the system is ergodic. Systems with a positive Lyapunov exponent are typically chaotic and error of prediction grows exponentially if the initial state is slightly off.

Kolmogorov-Sinai Entropy
If we were to discretize time and coarsen the state space into d-dimensional cubes of each side (indexing each cube with a symbol), then a trajectory of evolution could be approximated with a string of symbols based on the index of each cube visited at each time step. For a simple system with predictable trajectories, the information needed to represent the ensemble of such strings is less than that of a system where trajectories disperse rapidly. Let K( , t) be the total number of different strings generated by the system in t steps in a coarse-grained state space. The topological entropy of the system is defined as: Please note that in this definition, each different string generated by the system is counted as one. If we take into account the frequency of observing a given finite string of length T (a word w T ), and apply the Shannon entropy formulation to this stream of words, we get the information content of T-symbol paths generated by the system as: Kolmogorov-Sinai entropy is then defined as the average information content per time step of the path, chosen as the highest one over all possible coarse graining partitions of state space {G}:

Basin Entropy
Dynamical systems theory has traditionally been path-centric. Chaoticity is described as divergence of close-by paths, and entropy is conceptualized as richness of paths emanating from a given present. Where paths end up in the long-term is called the attractor of a dynamical system. In systems with fixed-point or periodic attractors, future is rather dull. In chaotic systems however paths are entangled in such a way that bundles which are close-by soon end up far separated and the attractor is called strange. Some dynamical systems, and most computational automata, have multiple simple attractors. In these systems while the future is dull, present can be exciting if knowing the now gives little indication on which final simple destiny one ends up in. This is the basin-centric way of looking at the dynamics.
Consider a discrete-time dynamics x → x 2 − y 2 + a and y → 2xy + b with (a, b) system parameters. If one marks all possible initial states x(0), y(0) according to two ultimate destiny scenarios (escape to infinity or not), we get the fractal Julia sets as given in Figure 1, and quite interesting ones for different system parameters (a, b). A basin of attraction is the set of initial states that end up in a given attractor. The basin entropy measures the entanglement of these basins.

Entropy in Discrete State Space
A cellular automata ('CA') is a dynamical system that consists of a grid network of cells where each cell run the same look-up table-based evolution rule that tells it how to change its current state (which is drawn from a finite number of elements) into the next, given the neighbors' and its current state. The global state of CA is the configuration of local states of each cell on the grid. The local evolution rule and state information exchange between neighbors synchronously update all local states in each time step, changing the configuration of CA to its next, hence generating the dynamics on discrete configuration state space. A CA is called elementary if the grid is 1D, each cell takes binary state values and the neighborhood is just the left-right nearest cells.
Entropy for CA can be defined with the same spirit as Kolmogorov-Sinai entropy given above. Assuming a 1D grid, consider a window of L cells and ignore the rest. This is effectively a coarse graining of the configuration space since all configurations that have the same pattern within the window are grouped together. Now, let us look into strings of window patterns generated by the system in T time steps w T L with observation frequency P(w T L ). The entropy of this block (i.e., entropy of the system at this coarse graining) is: The equivalent Kolmogorov-Sinai entropy of CA is then given as the average information content per time step of the evolution path with no coarse graining, i.e., when the cell-space window is extended to full grid size:

Dynamics of Cellular Automata
As per Equation (2), Lyapunov exponents rely on differentiability in state space. Since it is a Cantor set, the configuration space of CA does not lend itself to differentiability. Shereshevsky [1] in 1992, proposed the average propagation speed along the CA grid of a single cell disturbance as a proxy of CA's Lyapunov characterization. Shereshevsky showed that for ergodic 1D CA almost all configurations have unique left and right propagation velocities, λ − , λ + , and further established the inequality linking entropy of the CA to entropy of the shift operator σ and the propagation speeds as: h(F) ≤ h(σ) (λ − + λ + ). Later in 2000, Tisseur [2] expanded the Sheresevsky's Lyapunov characterization by introducing average propagation velocities. Entropy given in Equation (8) is finite and calculable for trivial CAs such as left-shifting CA. However, Hurd, Kari and Culik [3] (1992) proved that it is uncomputable for an arbitrary CA. The undecidability result does not preclude the existence of large classes of examples, such as linear CA, for which explicit calculations are possible. A CA is called linear if the local state set S is endowed with the sum and product operations that make it a commutative ring, and the local evolution function f is given as the linear combination of the neighboring cell's local states, f (x −r , . . . , x +r ) = ∑ i a i x i with {a i } ∈ S. With sum and product operations defined on S, it is possible to describe a given configuration c with characteristic power series P c (z) = ∑ ∞ i=−∞ c i z i and the local function as a kernel A f (z) = ∑ r i=−r a i z i . For linear CA dynamics reduces to P F(c) (z) = P c (z)A f (z). Manzini et al. published a series of articles [4][5][6][7] on topological dynamics of linear 1D CA and explicitly calculated the entropy.
Topological entropy is a characterization of the richness of time streams of configuration patterns generated by the CA. This richness is closely linked to compressibility of these pattern streams. Excellent progress has been made on practical implementation of compression algorithms for patterns. Lempel-Ziv coding [8] introduced in 1976 is a universal lossless data compression algorithm which is at the core of GIF image format used today. Zenil (2010) [9] proposed to calculate entropy of CA based on compressed length of output patterns with Lempel-Ziv-like compression.

Complexity Classification of Cellular Automata
In the early 1980s, Wolfram ignited research interest in CA with a series of articles [10][11][12] where he classified the apparent spatio-temporal configuration complexity as: (1) tends to a spatially homogeneous state; (2) yields a sequence of simple stable or periodic structures; (3) exhibits chaotic aperiodic behavior; and (4) yields complicated localized structures, some propagating, i.e., complex patterns. Wolfram's classification made a distinction of chaotic vs. complex configuration evolution. This was important because entropy-based definition used in dynamical systems theory did not differentiate chaotic behavior from complex. He conjectured that complex CA performed universal computation. Culik and Yu (1988) [13] proposed a formal framework for Wolfram's complexity classes. Key demarcation was whether an algorithm existed to decide for any two finite configurations whether one ever evolves to the other. Since then there has been a wide range of classifications on elementary CA to capture the intuition that complex was something different than chaotic.
Li and Packard (1990) [14] looked into the structure of 'rule space' and investigated bifurcations in the behavior of 1D CA by varying a classifier parameter λ defined as the ratio of non-zero entries in local rule table of CA. The rule space seemed to be split into ordered (fixed or periodic patterns) and chaotic regions with a sharp phase boundary. However, in certain parts of the rule space, between the ordered and chaotic domains, the phase boundary was a thick wall that housed critical rules which showed complex patterns. Kurka (1997) [15] provided three formal ways of classification for 1D CA: (i) based on the complexity of languages generated by partitions of the state space; (ii) based on Lyapunov stability; and (iii) based on attractors of the configuration space. Kurka demonstrated correspondence between the classes of these three schemes and Wolfram's heuristic classification. Chua et al. (2002) [16] proposed a complexity index to categorize the rules of elementary CA: Consider a cube graph where each vertex corresponds to one of 2 × 2 × 2 = 8 states of the input triplet of a cell. Let us color each such vertex with Red or Blue depending on what the output state ought to be based on the local rule table. Now, imagine partitioning the cube by planes such that same colored vertices remain in the same partition. Chua's complexity index is the number of planes required to make such partition. He showed that Class IV elementary CA of Wolfram had index 2 or 3. To overcome certain mis-categorizations of Chua index, Ewert recently (2019) [17] proposed a novel method based on residual circuit complexity. It takes into account not only minimal form representation of the rule table, but also the reduction of input entropy as the system evolves which further reduces the needed minimal form. Applied to elementary CA, Ewert showed that the proposed method matched well to Wolfram's heuristic classification.

Venturing into General Network Automata
Very few attempted to go beyond classical cellular automata which has two built-in regularities: (i) cells are arranged on a regular grid-like cell space, and (ii) each cell executes the same local rule, i.e., homogeneous. These properties arguably give the classical CA a physical-like feel with uniform laws and regular space, and make dynamical analysis tractable. It is, however, tempting to go beyond classical set up and consider generalized graph automata where cells are placed at vertices of a general directional graph, and each executes a different local rule. Many real-life systems with interacting agents are typically organized into networks such as Watts-Strogatz type small-world graphs, or Barabasi-Albert type scale-free graphs with non-uniform local rules. Marr and Hutt (2005) [18] investigated the pattern formation capacity of binary graph automata where each cell was executing a local law based on the average state value of the neighbors passed through a threshold function parametrized by value κ. They investigated both small-world and scale-free type graphs, varying the graph incrementally to see impact on pattern formation. They measured Shannon entropy of time strings of a single cell averaged over all cells. Since graphs were not regular grids, the authors did not define any spatial pattern concept, and investigated time patterns of single cells only. Tomassini (2006) [19] analyzed the performance of binary graph automata with homogeneous cell rules over small-world and scale-free graphs over two tasks. In density classification task, the cells, starting from random configurations are expected to converge to all 1 if their initial 1 density were greater than 0.5 and vice-versa. In synchronization task, the cells starting from random initial configurations are expected to synchronize into a global flip-flop cycle. Cattaneo et al. (2009) [20] introduced two types of non-homogeneous 1D CA: (i) local rules are variable only in a region outside which the rule is uniform, and (ii) the rule is non-uniform everywhere but the neighborhood has always same radius. The authors showed that most topological properties get demolished when CA becomes non-homogeneous.

Back to Basins
Basin entropy defined in Section 2.3 is a relatively new concept and plays a central role in this paper. Daza et al. (2016) [21] calculated basin entropy of two well-known physical dynamical systems with path chaoticity: (i) Duffing oscillator which is a periodically driven rod with a nonlinear elasticity, and (ii) Hénon-Heiles system which describes motion of a star around a galactic center. They defined basin entropy by coarse graining the basin into boxes and marking distribution of ultimate destinies' probability in each box, to which they applied an entropy measure.
Finite-size discrete systems have been mostly dismissed in the entropy world since paths are of finite size and ultimately fixed or periodic. However, these systems as computational machines are very rich in the structure of their basins of attraction. In fact, mapping initial states to ultimate destinies is what computation means for these machines. As we shall now formally see, our definition of chaoticity and more importantly complexity of such finite-state machines is all about quantifying the structure of basin partitioning, and as such fundamentally differs from previous work on complexity and entropy for automata.

Computron
"Things do not exist other than in arrangements". Tractatus-Ludwig Wittgenstein

Construction
is an arrangement of N cells with binary states ω i which is formally constructed with: . Each cell is assumed to be also its own neighbor, i.e., e ii ∈ E, ∀i.
An autonomous computron performs computation by mapping an initial configuration to an attractor (fixed or periodic). Such mapping partitions the configuration space. A driven computron generates an output string for a given input string based on an initial starting configuration.

Computron Generating Graph
Cell with connector Computron is constructed on a generating graph. Cells are placed at vertices of the graph. Each cell has a binary state ω i and is endowed with a local look-up table rule, called a connector. Global state of the computron is the configuration of local states on the generating graph. v IN is the special cell that interfaces the external input and v OUT is the nominated cell for reading the output.

Computron vs. CA and FSA
By construction, any finite-size cellular automata is a computron where generating graph is a regular grid, connectors are homogeneous, and there is no external input. Finite-state automata ('FSA') is a computational machine that maps an input string to an output one with no memory. Formally, the machine M = [S, s 0 , Σ, Γ, T, U] is specified by a set of states S, an initial state s 0 , an input alphabet Σ, an output alphabet Γ, the Transition Map T : S × Σ → S, and the Output Map U : S × Σ → Γ. FSA generates an output string y = γ 0 γ 1 γ 2 ... ∈ Γ * driven by the input string x = σ 0 σ 1 σ 2 ... ∈ Σ * . Without loss of generality we assume that input and output alphabets are binary.

Purpose
Dynamical systems live on metric spaces whereas computation does not. Computron as a model of computation is constructed with global states defined as configurations of local states on its generating graph. In this section, we formally define a set measure called diversity measure which gives the cardinality of a set adjusted for similarity of its elements. Diversity measure induces a similarity distance between two sets as the measure of their symmetric difference. This similarity metric generates a metric space for the states of the computron called diversity space. We later perform all our dynamics investigations for the computron on this diversity space.
Various distance functions between sets with elements embedded in a metric space have been proposed in the literature. Hamming distance is arguably the simplest one which counts the number of different elements in two sets; it lacks the concept of similarity among the elements. Hausdorff distance between two sets is calculated as the furthest separation of one set's elements from closest of the other; final result reduces to distance between two most significant elements disregarding the rest.
The sum-of-min-distances incorporates all elements, but it is an aggregation of closest peers, hence presence of non-closest peers are shadowed, thus fails to capture an ensemble closeness.
Other distance measure between sets have been proposed to capture the overall similarity of the elements of two sets given the underlying metric space such as Surjection distance and Link distance [22]. Bennett, Vitanyi and Zurek et al. [23] introduced absolute information distance metric between two objects. Consistent with definition of absolute information content of an object as length of the shortest binary program K(x) running on a universal machine, generating the given object x, they described the distance metric between two given objects x and y as length of the shortest binary program that generates one given the other: d * (x, y) = max K(x|y) + max K(y|x). They proved that absolute information distance is a universal pattern similarity metric. This theoretical metric is similar to that of algorithmic complexity in that while universal, it is not computable in its pure form.

Diversity Measure
We introduce diversity measure as the size of a set adjusted for the similarity of its elements. The intuition behind set-theoretic formulation below is as follows: Assume we are asked to form a diverse team from a pool of students. Each student has a profile which allows us to define how similar two students are. We build the team by adding one student at a time. First member contributes an individuality of 1. Second member's individuality is reduced by her similarity to the first member. Third member's is reduced by his similarity to the first two. Diversity of the team is sum of individualities.
Let (Q, f ) be a metric space where Q is a finite set and f is a metric of Q, i.e., a function f : 1] be a concave monotonic function with ρ(0) = 0 (similarity kernel) and define ρ(x, y) = ρ( f (x, y)), then (Q, ρ) is also a metric space.
Individuality of an element x given a set of constraining elements A is defined as: if ρ(x, y) = 0, i.e., there exists an element y ∈ A that is exactly the same as x, then d(x|A) = 0, meaning x has no individuality given the constraining set A. Diversity of an ordered set A = [x 0 , x 1 , x 2 . . . ] given a set of constraining elements, B is defined as: Clearly, the ordering of the tuple A matters for the calculation of diversity, therefore we define the diversity measure of a set A constrained by set B as the supremum over all possible orderings of the tuple A. For numerical experiments, we devised a routine whose computation load grows quadratically rather than combinatorially which made calculation of diversity measure feasible.
Section 6 of this paper introduces probabilistic computron which generates probability distributions on states. Therefore, we extended diversity measure to handle weighted sets. Let w : S → R 0+ be a weight function. Individuality of a weighted element x given a set of constraining weighted elements A is given as: Diversity and diversity measure for weighted sets follow the same definitions as in Equations (13) and (14) using individuality for weighted sets. We use the following extension for union∪, symmetric difference˜ , and subset⊆: Lemma 2 (Diversity measure). Diversity is a sub-additive measure on Q.
Diversity-induced distance (DiD) is an induced metric on Q * that captures dissimilarity between two sets, A, B ∈ Q * : where ϕ : Q * × Q * → [0, 1],˜ is the symmetric difference operator as defined in Equation (17) and |Q| = ∑ Q w(x) is size of weighted space.

Entanglement
We introduce the entanglement concept which will later be used in defining partitioning complexity of a computron. Entanglement of a set of sets A = {A 0 , A 1 , A 2 ...} is defined as: where Entanglement captures the total diversity of a collection of sets where each set's diversity is constrained by the presence of the other sets. The more the sets are within each other's proximity entanglement increases. On the other hand, as one given set's elements get separated from each other, that set's diversity decreases. Entanglement captures the balancing of these two factors in calculating the total diversity of the set of sets. This is illustrated in Figure 4.

Embedding in Metric Space
Given a computron M = [G, C] with a generating graph G and connectors set C, we embed it in a metric space that captures similarity of its configuration states as follows: The distance between two cells is set as the shortest-path-length on generating graph between the two vertices occupied by these cells mapped to a unit interval. The intuition of this distance definition between two cells is that a state change in one of the cells can possibly impact the state of other one in compute steps that are no less than the shortest-path-length.
ρ : R 0+ → [0, 1] concave, monotonic with ρ(0) = 0. We then define the configuration state of the computron as set of cells that have local state of 1.
This leads to diversity-induced distance as a metric for dissimilarity of the states for computron: By Lemma 3, (S, ϕ) is a metric space with each configuration of the computron an element of the space. This is the diversity space that we embed the computron where distance between two global states of the machine is based on the similarity of the corresponding configurations. Figure  5 illustrates the concept on a computron where generating graph is assumed to be regular 2D grid. Each configuration is a set of pixels that are have state 1 (a binary image). Diversity-induced distance between configuration s 0 and s 1 is less than that of s 0 and s 2 , capturing their similarity, although Hamming distances are the same.

Computron as a Dynamical System
An autonomous computron has no external input. Starting from an initial configuration, it moves to next based on the transition map given in Equation (9). The dynamics is governed by discrete-time iteration on configuration metric space (S, ϕ):

Chaotic Computation
As defined in Equation (2) Lyapunov exponents characterize the rate of separation (or convergence) of infinitesimally close trajectories in a dynamical system. In discrete-time systems (maps), x n+1 = f (x n ), n ∈ N, for an orbit starting with initial point x 0 , Lyapunov exponent becomes: For a discrete-time discrete state-space dynamical system (i.e., dynamics of autonomous computron) on a metric space (S, ϕ), we define Lyapunov number for an initial configuration s 0 ∈ S as follows: Let B (s) = {q ∈ S : ϕ(s, q) < } denote the epsilon-ball around state s, where is set as the lowest value where epsilon-ball is non-empty. Let A M = {A 0 , A 1 , . . . } be the attractors of computron, and P M = {B 0 , B 1 , . . . } the corresponding basins of attraction. We calculate Lyapunov number of a state as the average separation of its neighboring states as iteration time goes to infinity. Since, each state converges to its attractor, we measure the dissimilarity of attractors using an induced DiD metric.
Lyapunov number for the computron is the average over the configuration space λ M := λ(s) S .

Complex Computation
While chaos is a well-defined concept in dynamical systems theory, complexity has been more elusive with wide range of definitions as reviewed by Grassberger [24]. Observer's reference base using existing information is relevant for defining complexity, yet it leads to observer subjectivity. This can be avoided by either dealing with an equivalence class of the object as the reference base, or by treating the object on a self-referential basis using a generative description. In the former case, we are in the domain of information theoretical definition of complexity, whereas the latter is in the domain of algorithmic complexity. Either way, the complexity becomes a measure of difficulty in describing the object in question with regards to a reference base.
We introduce diversity-based complexity of a computron as the disentanglement of its basin partition. This is a basin-centric approach, not a path-centric one. Computron performs its computation by act of mapping the input configurations to attractors, generating a partitioning of configurations space into basins of attraction. We consider the complexity of this partitioning as the defining characteristic of computron as the computational machine. This is different than Wolfram's complexity definition which is path-centric.
with Ψ M ∈ [0, 1] since P M ≤ P ∞ M by Lemma 4. This definition means that a complex computron has a basin partition where each basin consists of maximally diverse set of configurations with respect to other configurations in that basin, and furthermore basins themselves are maximally diverse with respect to each other. Therefore, a high complexity partitioning of the configuration space requires maximum information to describe such partitioning. As depicted in Figure 7 complexity of a partitioning increases initially with the number of partitions, then it starts decrease such that no partitioning and full partitioning have zero complexity.
An absorbing computron has a single fixed-point attractor which all configurations iterate into denoted as M 0 : ∀s ∈ S, ∃k ∈ N T k (s) = s * , where T is the transfer map of computron and s * is the absorbing state. An idle computron does nothing, i.e., each state is mapped to itself; and is denoted as M ∞ : ∀s ∈ S, T(s) = s.

Case Study-1D Cellular Automata
We analyzed complexity and chaoticity of autonomous computrons with linear cyclic generating graph and homogeneous connectors, i.e., elementary cellular automata. As expected, absorbing and idle computrons have minimal complexity and chaoticity. With same generating graph (linear cyclic of 5 cells), we analyzed different connector types. Chaoticity and complexity of each computron is plotted in Figure 8. There are several elementary CA that have very low complexity and not chaotic (less than 1 chaoticity). However, there is a group of machines that demonstrate high complexity while not chaotic.

Computron as a Markov Chain
A driven computron, unlike an autonomous one, has a transition map that is a function of its current configuration and the external input drawn from binary alphabet. Dynamics is governed by discrete-time iteration: where σ n is the external input at time n. We assume no a priori information about the input string other than observed frequency of alphabet letters, resulting in representing the input string as a sequence of independent identically distributed random variables on binary alphabet Σ with p = Pr{σ = 0}. A noisy computron is a computron exposed to thermal fluctuations in the binary states of its cells. The dynamics is governed by probabilistic discrete-time iteration: where B (s n ) is epsilon-ball neighborhood of configuration s n and is the probability of malfunction due to thermal noise. We now enter the domain of Markov chains to describe the probabilistic dynamics of the driven and noisy computrons. A discrete Markov chain is a sequence of random variables with domain S, {X n , n ∈ N}, where each random variable X n is independent of prior random variables in the sequence given its previous one.
Probabilistic computrons are equivalent to a discrete Markov chain. They can be represented with directed labeled graphs where each vertex is assigned a configuration s i ∈ S, and an edge from configuration s i to s j with transition probability P ij = 0. It can also be represented by a M × M transition probability matrix P where M = |S| is the number of all configurations and P ij = P ij . From matrix representation, it follows that probability of reaching k from i in exactly n steps is P n ik .

Terminology
• n-step walk is an ordered set of states (i 0 , i 1 , . . . , i n ) in which there is a directed edge from i m−1 to i m and no states are repeated • State j is accessible from state i if there exists a walk from i to j, shown as i → j • Two states communicate if i → j and j → i • Class is a set of states where each pair of states in the set communicate, and none communicate with a state outside the class • State i is called recurrent if i → j ⇒ j → i • Transient state is a state that is not recurrent • Period of a state i is the greatest-common-divisor of those values of n for which P n ii > 0. If the period is 1, the state is called aperiodic. • It follows from definitions that for a given class, either all states are recurrent, or they are all transient. It can also be shown that all states in the same class have the same period. • A class is called ergodic if it is recurrent and aperiodic. • A Markov chain consisting entirely of single ergodic class is called and ergodic chain.
• A unichain contains a single recurrent class plus, possibly, some transient states. An ergodic unichain is a unichain for which the recurrent class is ergodic. • A steady-state distribution for an M state Markov chain with transition matrix P is a row vector π that satisfies: For a given Markov chain, Equation (35) always has a solution, and the solution is unique if and only if the Markov chain is a unichain. If there are c recurrent classes, then there are c linearly independent solutions each with non-zero entries only for the states of corresponding recurrent class. P n converges if and only if recurrent classes are aperiodic with each row of P n converging to π of one class.
We compare dynamical systems concepts for autonomous computron to that of Markov chain. An attractor is analogous to a recurrent class. An ergodic dynamical system has a single attractor, and similarly an ergodic chain has a single recurrent aperiodic class. In autonomous computron a transient state feeds into a single attractor, therefore the concept of basin of attraction is well-defined. However, in driven automaton, a transient state can feed into multiple recurrent classes with overlapping basins.
Let P ij be the probability of transition from state s i to s j as described in Equation (33), and P the corresponding transition probability matrix. Let p n (s i ) be the probability of computron being in configuration s i at time n, and p n be the corresponding probability distribution vector over configuration space S. Its evolution starting from an initial distribution p 0 is governed by the master equation: Impact set R s of a given configuration of computron is a set weighted by the impulse response distribution π s defined as follows: Since Equation (36) is linear, steady-state distribution of a driven computron can be written as linear combination of the impulse responses of all states weighted by the initial probability distribution. Let p 0 be an arbitrary initial distribution stated as linear combination of delta-dirac distributions over all configurations: Then, steady-state distribution π becomes: π = ∑ s∈S p 0 (s)π s (40)

Ergodic Decomposition
A computron is said to be ergodic if its Markov chain representation is ergodic (i.e., single recurrent aperiodic class). Typically, computrons are non-ergodic, i.e., they have more than one recurrent class, and can be periodic. A periodic computron would not have a stationary steady-state distribution as P n would not converge. Stationarity can be introduced by imposing 'asynchronicity' where we modify each state's transition map with non-zero probability λ to stay idle.
As per the definition of periodicity (see Terminology), the greatest-common-divisor of n for which P n ii becomes 1 for all states (since they all have a transition to self with λ probability), and therefore the machine would become aperiodic. An aperiodic computron would haveP n converge with each row becomingπ of one recurrent class.
The Coloring Lemma provides a technique for identifying the recurrent classes C M = {C 1 , C 2 , . . . , C K } of a computron. Let p a , p b two random initial distributions with non-zero entries for each state, and let π a and π b be the corresponding steady-state distributions in an asynchronous computron. Define, the 'gray color' of a state c(s) as: Lemma 7 (Coloring). Two states will, almost always, have the same shade of gray if and only if they are elements of same recurrent class, provided all classes of the computron are 'distinguishable' (as defined in the proof); i.e., for two random non-zero initial distributions, c(s) = c(q) > 0 ⇔ s, q ∈ C i , almost always.
Having identified the recurrent classes of a computron using the Coloring Lemma, we now use them as basis for ergodic decomposition: Lemma 8 (Ergodic bases). Recurrent classes form a non-overlapping spanning set (basis) for the impact sets of all states. Let C M = {C 1 , C 2 , . . . , C K } be the set of recurrent classes of a computron and χ s = {χ 1 (s), χ 2 (s), . . . χ K (s)} the ergodic decomposition of impulse response π s : then: We can now formally define basin of attraction for a recurrent class as a weighted-setB(C k ):

Chaoticity-Complexity of Probabilistic Automata
A computron acting on a configuration space transforms each configuration into its impact set. Configurations that are similar to each other (based on diversity-induced distance) can be transformed to similar or different impact sets. We define the ratio of DiD distance among the impact sets of two configuration to distance between them as the 'bending ratio'. Computron's action on the configuration space can be conceptualized as transforming the distance between each pairs of configurations with the bending ratio.
Lyapunov number for a state is the average bending of the -ball around such state: Chaoticity of the machine is measured by the average Lyapunov number over configuration space.
As an extension of the complexity definition we used for autonomous computrons, Equation (30), complexity of a probabilistic computron is the disentanglement of its fuzzy basin partition.
As a case study we analyzed the complexity of hybrid elementary CA that execute two alternative rules based on the input signal. Figure 9 shows the variation in complexity of hybrid CAs with the driver mix p(σ = 1) changing in [0, 1]. For p = 0 and p = 1 the machine is regular elementary CA. We plotted the deviation of complexity from a linear mix, so-called synergy, as p is varied. Creating hybrid CAs using high and low complexity constituents results in drop in overall complexity (i.e., non-synergistic mix). The drop happens at even small mixing around the higher complexity CA. This compares to synergistic case when both constituents are of high complexity.

Susceptibility
Let M A , M B be two computrons operating on the same state space. We provide a measure of topological similarity between them by comparing their bendings of the configuration space. The bending ratio τ(s, q) was defined in Equation (46). The similarity of two computrons is given as: Equipped with a measure for similarity of two computrons, we can now analyze distortions in a machine upon small structural changes. As an application, we investigate distortion of a computron due to noise which is given as: (50) Figure 10 shows noise distortion curves on log-log scale for a computron executing Rule 29.175 with varying levels of driver intensity. The machine exhibits linear noise distortion curves with saturation at high noise levels when it is driven in region Rule 29 towards Rule 175. However, as it gets closer to Rule 175, noise distortion become exponential with power γ ≈ 1.45. This characterization is consistent with the sudden change in complexity around Rule 175 as was provided in Figure 9.

Discussion
This paper is an investigation into dynamics and complexity of a computation machine, computron, which is equivalent to a finite-state machine and generalization of a cellular automata. We embedded the computron into a metric space that captures similarity of configurations as patterns on the generating graph of computron. We analyzed the chaoticity and complexity of computron in this metric space. Our approach to complexity was basin-centric rather than path-centric. We also expanded the framework into non-autonomous machines with external drivers which led to Markov chain formalism. We applied our theory to hybrid elementary cellular automata with noise. Case studies showed that machines with high complexity are more susceptible to externalities, such as mixing and noise.
Our framework of conceptualizing automata as computrons in diversity space opens up a series of avenues for future research, including the study of robustness and complexity of various generating graphs (e.g., hierarchical) or connector families (e.g., reversible, linear). We can also investigate evolutionary dynamics of a computational machine by 'growing' (attaching cells to an existing machine) and 'mutating' (altering generating graph or the connectors) computrons.   Proposition A1. For any computron Ω with an arbitrary generating graph G there exists an equivalent computron Ω * with a fully connected generating graph G * with same number of vertices.
We pair up cells of Ω and Ω * as v i ↔ v * i and we use base indexing to enumerate configurations s Ω and s Ω * . This establishes the one-to-one mapping between the two configuration spaces µ : S Ω ↔ S Ω * . Then we construct the connectors [C * ] of Ω * as follows: Let δ i = [v i 0 ...v i n(i) ] be the neighbors of cell v i , and let δ i be the rest of the cells (i.e., not-connected to v i .), then we set the loop-up tables of connectors C * i for each cell as follows: With output maps also set identical, U * (s, σ) = U(s, σ), then the implied Transition Maps T Ω (s) and T Ω * (s) satisfy Equation (A1). Therefore, Ω ≡ Ω * . Proposition A2. There is one-to-one correspondence between finite-state automata M of 2 N states with binary input and fully connected computrons Ω * with N + 1 cells, including the input cell.
Let U Ω * denote set of all fully connected computrons with N + 1 cells (including the input cell).
Each cell of Ω * (other than the special input cell) has N + 1 inputs from other cells and itself (fully connected) with 2 N+1 input configurations. With each input configuration having a binary output set by the connector, the number of different connector types are 2 (2 N+1 ) . Each cell, other than the input cell, can have any of the connector types (input cell has a unique connector that solely acts as interface to external input). Therefore, total number of different computrons with fully connected generating graph is (2 2 N+1 ) N = (2 N ) (2 N ) × (2 N ) (2 N ) . Let M be a finite-state automaton with 2 N states, and let U M denote set of all such machines. Since each state can go to any state based on the binary external input, we have: Base indexing provides the one-to-one map between the configurations of Ω * and (excluding the input cell) and the states of M. From Proposition A1, for each computron there is an equivalent fully connected computron. From A2, there is a one-to-one mapping between finite-state automata of 2 N states with binary input and a fully connected computron of N + 1 cells, including the input cell. Therefore, for any finite-state automata there exists an equivalent computron.
Lemma A2 (Diversity measure). Diversity is a sub-additive measure on Q. Monotonicity: x }, respectively, satisfying subset definition given in Equation (17). Let A * = [bac] be the optimal ordering of A-tuple that maximizes sup A A w , and construct B * = [bacx].
x }, respectively. Let C * = [bxa] be the optimal ordering of C-tuple that maximizes sup C C w . Construct A * = [xa] and B * = [bx] as lifting from C * with same ordering within. Then: The proof is based on a combinatorics argument. Entanglement of P 0 is calculated by shuffling all the elements of universe Q within a single set, whereas entanglement of P ∞ is calculated by shuffling sets of all single elements, to find the supremum.  Therefore, P 0 = P ∞ . For any other partition P, entanglement is calculated by shuffling the set blocks of the partition together with shuffling the elements within each set block. Therefore, its supremum is less than or equal to that of P 0 which seeks the supremum through all possible orderings. As a simple case consider Q = {a, b, c}. Therefore, P 0 = P ∞ ≥ P .
Lemma A5 (Basin partition). The set of all basins of an autonomous computron is a partition of its configuration space, denoted as P M = {B 0 , B 1 , . . . }, with the corresponding attractors A M = { A 0 , A 1 , . . . }.
A partition of S is P = {B 0 , B 1 , . . . }, where B i ⊂ S, ∪ i B i = S and B i ∩ B j = ∅, i = j. So, we need to show that each state belongs to some basin and no state belongs to more than one basin. Due to finite-state nature of the computron this is trivial. Let P n (s 0 ) = {s 0 , s 1 , ..s n } be the n-step forward path of state s 0 . Since |S| < ∞, ∃k ≤ |S| such that s k+1 ∈ P k (s 0 ) and s i+1 / ∈ P i (s 0 ), ∀i < k. Denote s k+1 = s q (q ≤ k). Then, A = [s q , s q+1 ...s k ] is an attractor of the computron, as per definition, and s 0 ∈ B( A). Therefore, each state is an element of some basin. Now we show that no state belongs to more than one basin. Assume s ∈ B( A a ) and s ∈ B( A b ) with A a = [s q a . . . s k a ] and A b = [s q b . . . s k b ]. If k b < k a then ∃i = k b s i+1 ∈ P i (s) which means A b is not an attractor as it violates attractor definition, and similarly if k a < k b then B b is not an attractor. Therefore k a = k b and A a = A b .
Lemma A6 (Minimal complexity). Absorbing and idle computrons have the least complexity (nil) compared to any other computron.
Absorbing computron M 0 has a single fixed-point attractor which all states iterate into therefore its basin partitioning is P M 0 = {S} = P 0 , and idle computron M ∞ each state is mapped to itself, therefore its basin partitioning is P M ∞ = {{s 0 }, {s 1 } . . . } = P ∞ . Hence from definition of complexity, Lemma A7 (Coloring). Let p a , p b two random initial distributions with non-zero entries for each state, and let π a and π b be the corresponding steady-state distributions in an asynchronous computron. Define, the 'gray color' of a state c(s) as: c(s) = π b (s)/π a (s) if π a (s) > 0 0 else (A12) Two states will, almost always, have the same shade of gray if and only if they are elements of same recurrent class, provided all classes of the computron are 'distinguishable'; i.e., for two random non-zero initial distributions, c(s) = c(q) > 0 ⇐⇒ s, q ∈ C i , almost always.
Let C M = {C 1 , C 2 , . . . , C K } be the recurrent classes of a computron, and π M = {π 1 , π 2 , . . . π K , } the corresponding steady-state distributions of the class. Then: First we show, s, q ∈ C m =⇒ c(s) = c(q). By definition recurrent (see Terminology) classes are non-overlapping. If s, q ∈ C m , then: π a (s) = χ m (p a )π m (s), π a (q) = χ m (p q )π m (q) π b (s) = χ m (p b )π m (s), π b (q) = χ m (p b )π m (q) (A14) Therefore, c(s) = π a (s)/π b (s) = χ m (p a ) χ m (p b ) = π a (q)/π b (q) = c(q) We define 'distinguishable' as follows: Let W x→y = x 0 x 1 ...x k be a walk state s x to state s y . We define the walk probability as p( W x→y ) = ∏ k−1 i=0 P x i x i+1 . LetW x→y denote set of all different walks from state s x to state s y . We define reach probability from s x to s y as p x→y = ∑ W∈W p( W x→y ). We define reach probability from a state s x to an attractor A as p x→A = ∑ s y ∈A p x→y , where the walk from s y ∈ A there does not have any other element of A. Two recurrent classes A and B are said to be 'equidistant' with respect to state s x , if p x→A = p x→B . Two recurrent classes A and B are said to be not 'distinguishable' if they are equidistant with respect to all states in S. Define the |S| × |C| matrix P C as follows: