Learning the undecidable from networked systems
Felipe S. Abrah\~ao, \'Itala M. Loffredo D'Ottaviano, Klaus Wehmuth,, Francisco Ant\^onio D\'oria, Artur Ziviani

TL;DR
This paper explores how networked systems can collectively solve undecidable problems like the halting problem through emergent behavior, communication, and selection mechanisms, surpassing traditional computational limits.
Contribution
It introduces a mathematical model of networked computable systems, demonstrating conditions under which nodes can solve undecidable problems and defining a new measure of informational synergy.
Findings
Nodes can solve the halting problem within logarithmic communication rounds.
A central node can emergently solve the halting problem efficiently.
The network can generate arbitrarily large local algorithmic synergy.
Abstract
This article presents a theoretical investigation of computation beyond the Turing barrier from emergent behavior in distributed systems. In particular, we present an algorithmic network that is a mathematical model of a networked population of randomly generated computable systems with a fixed communication protocol. Then, in order to solve an undecidable problem, we study how nodes (i.e., Turing machines or computable systems) can harness the power of the metabiological selection and the power of information sharing (i.e., communication) through the network. Formally, we show that there is a pervasive network topological condition, in particular, the small-diameter phenomenon, that ensures that every node becomes capable of solving the halting problem for every program with a length upper bounded by a logarithmic order of the population size. In addition, we show that this result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Learning the undecidable
from networked systems
Felipe S. Abrahão
National Laboratory for Scientific Computing (LNCC)
25651-075 – Petropolis, RJ – Brazil
,
Ítala M. Loffredo D’Ottaviano
Centre for Logic, Epistemology and the History of Science, University of Campinas (UNICAMP) – Brazil
,
Klaus Wehmuth
,
Francisco Antônio Dória
Advanced Studies Research Group and Fuzzy Sets Laboratory, PIT, Production Engineering Program, COPPE, UFRJ
P.O. Box 68507, 21945-972 – Rio de Janeiro, RJ – Brazil
and
Artur Ziviani
Abstract.
This article presents a theoretical investigation of computation beyond the Turing barrier from emergent behavior in distributed systems. In particular, we present an algorithmic network that is a mathematical model of a networked population of randomly generated computable systems with a fixed communication protocol. Then, in order to solve an undecidable problem, we study how nodes (i.e., Turing machines or computable systems) can harness the power of the metabiological selection and the power of information sharing (i.e., communication) through the network. Formally, we show that there is a pervasive network topological condition, in particular, the small-diameter phenomenon, that ensures that every node becomes capable of solving the halting problem for every program with a length upper bounded by a logarithmic order of the population size. In addition, we show that this result implies the existence of a central node capable of emergently solving the halting problem in the minimum number of communication rounds. Furthermore, we introduce an algorithmic-informational measure of synergy for networked computable systems, which we call local algorithmic synergy. Then, we show that such algorithmic network can produce an arbitrarily large value of expected local algorithmic synergy.
Key words and phrases:
Extended keywords: distributed systems; halting problem; busy beaver function; hypercomputation; metabiology; synergy
2010 Mathematics Subject Classification:
68Q30; 03D32; 68R10; 05C30; 05C78; 05C75; 05C60; 05C80; 05C82; 94A15; 68Q01; 03D10; 03D32; 03D35; 03D80
Authors acknowledge the partial support from CNPq through their individual grants: F. S. Abrahão (313.043/2016-7), K. Wehmuth (312599/2016-1), and A. Ziviani (308.729/2015-3). Authors also acknowledge the INCT in Data Science – INCT-CiD (CNPq 465.560/2014-8). Authors also acknowledge the partial support from CAPES, FAPESP, and FAPERJ
1. Introduction
We study the general problem of how computable systems may take advantage of an uncomputable environment to solve undecidable problems, in particular, the halting problem. Indeed, although the theory of Turing degrees is a well-established field in computability theory [Syropoulos2008, Crnkovic2012a, Calude2002] and mathematical logic [Rogers1987], the possibility of computing beyond the Turing limit, that is, solving problems of Turing degree or above, is one of the major conundrums in the interface of theoretical computer science, mathematics, physics and biology. Such subject has its roots on the incompleteness results in foundational mathematics, mathematical logic, or recursion theory. This posits questions on the computability of the Universe [DaCosta2006, DaCosta2009a, Cooper2009, Longo2012, Prokopenko2019, Copeland2002, Crnkovic2012a, Calude2002], living systems[Longo2012, Prokopenko2019, Copeland2002, Abrahao2015, Shimansky2018], or the human mind [Siegelmann1995a, Zenil2006a, Copeland2002, Copeland1998].
On the other hand, besides artificial systems, the computable nature of Life has been supported by current findings and scientific views in computer simulation and evolutionary computation [Bedau1998, Standish2003, Fouks1999, Dingle2018] and algorithmic-informational theoretical biology [Chaitin2012, Chaitin2014, Hernandez-Orozco2018, Hernandez-Orozco2018a, Abrahao2015, Abrahao2016, Chaitin2013, Chaitin2018]. In this way, as a pure mathematical work inspired by important concepts from complex systems science, the present article has an underlying objective of reconciling an hypothetical computable nature of Life (or artificial systems) with an hypothetical uncomputable Nature (or environment).
We study models of computation for synergistically solving an uncomputable problem in a network of computable systems. We show an abstract model for networked computable systems that can harness the power of random generation of individuals and the power of selection made by an irreducibly more powerful environment. In particular, we investigate the problem of how networked populations of randomly generated programs under metabiological selection of the fit in pervasive topological conditions can solve more and more instances of the halting problem by a fixed global communication protocol. For this purpose, we investigate a class of algorithmic networks based on [Abrahao2017published, Abrahao2018publishedAMS] that can solve the halting problem by a synergistic communication protocol of imitation of the fittest neighbor. Thus, our theoretical results show that, within a hypercomputable environment as our assumption, whole populations of computable systems can be hypercomputers. To this end, we present definitions, theoretical models, and theorems based on computability theory, algorithmic information theory, distributed computing, and complex networks theory.
In this sense, this article shows how metabiology’s early findings in [Chaitin2012, Chaitin2013] have “opened the gates” not only to unify main concepts of theoretical evolutionary biology, theory of computation, and algorithmic information theory, proving the open-ended Darwin-like evolution [Hernandez-Orozco2018, Hernandez-Orozco2018a, Abrahao2015, Chaitin2018], but also to initiate in [Abrahao2017published, Abrahao2018publishedAMS] a unifying mathematical theoretical framework for complex systems under a both computational and informational perspective, which combines metabiology with statistical information theory, game theory, multi-agent systems, and network science. In this way, we will briefly discuss in Section 2 how metabiology and algorithmic networks may help bridge abstract formalizations of systemics from mathematical-logical approach in [Bresciani2018] to more realistic models of evolutionary [Hernandez-Orozco2018a] or distributed systems [Prokopenko2009, Fernandez2013, Griffith2014a].
In the present article, by adding one more important systemic property from the complex systems’ “zoo” to this theoretical framework, we also introduce an algorithmic-informational measure of synergy and we apply it to our theoretical models, proving that the local algorithmic synergy in solving the halting problem can be as asymptotically large as one may want.
2. A conceptual background: From systemics, information, and metabiology to algorithmic networks
We start by introducing in this section some of the conceptual background of our results. For this purpose, we will briefly cross over more abstract approaches to complex systems theory, intertwining foundational subjects and properties related to mathematical logic [Bresciani2018], information theory [Lizier2018, Griffith2014a] and metabiology [Chaitin2012, Chaitin2018, Hernandez-Orozco2018a, Abrahao2015], in order to go toward the theory of algorithmic networks [Abrahao2016b, Abrahao2017published, Abrahao2018publishedAMS].
A system may be initially defined as a unitary entity of a complex and organized nature, made up of a set of active elements [Bresciani2018]. It is characterized as a partial structure with functionality. The general notion of relation used in the definition of system, that of partial relation, is an extension of the usual logical-mathematical concept of relation. It was presented by [Mikenberg1986] as basic to the introduction of the mathematical concept of pragmatic truth, later called quasi-truth, and has recently received various applications in logic and philosophy of science (see [Agazzi2010]). In this way, it enables one to formally accommodate the incompleteness of information relative to a certain domain of investigation.
Definition 2.1**.**
Let be a non-empty set. An -ary partial relation on is an ordered tern , , , and with no two of them having common elements, and whose union is , such that:
- (1)
is the set of the -tuples that we know belong to ; 2. (2)
is the set of the -tuples that we know do not belong to ; 3. (3)
is the set of the -tuples for which we do not know if they belong to or not.
Definition 2.2**.**
A partial structure is a pair with a non-empty set and each , where , a partial relation on .
Definition 2.3**.**
A system is a partial structure with functionality, that can be denoted by:
[TABLE]
being the universe of the partial structure, each a partial relation on , the functionality, with being the respective variation indexes.
Note that a system without the possibility of structural or functional alterations has its universe and functionality constant and can be denoted by . For instance, this is the case for computable systems in which every transition of internal states with external inputs is a partial recursive function and its functionality is the very computation of this function.
We observe that if does not have any elements (), then is a usual -ary relation, so that , which brings this general definition of a system back to the one in [Abrahao2015]. Hence, as in [Abrahao2015], the immediate definition of a computable system, in which is a computable (or recursive) relation and it represents a function, becomes well-defined. See also [Hernandez-Orozco2018] for a formalization of this idea in the context of evolutionary systems. In particular, the reader is invited to note that the oracle-sensitiveness [Abrahao2017published] property of the population directly assures that the respective algorithmic networks always behave exactly like a total function. The same also holds for early metabiological models in [Chaitin2012] and for sub-computable or hyper-comnputable versions in [Abrahao2015].
Functionality is a teleological notion, characterized as a certain informational directioning. It may be related to the system’s goals, targets, or ends, and the potential autonomy of the system’s components may lead to processes which are not individually, but instead globally, self-organized. The characteristics of the system may be considered emergences, with systemic synergy, globality and the possibility of novelty being considered among the first properties which appear in the constitution of the very existence of the system. In addition, a system is not completely isolated from its environment, because everything, matter, energy, information, that go into or out of the system comes from, passes through, or goes out to the environment. For example, in this regard, applications of statistical information theory to stochastic dynamical systems have already shown foundational results in defining and measuring such systemic properties [Lizier2018, Prokopenko2009, Prokopenko2014, Fernandez2013, Oizumi2014]. However, apart from evolution as in [Chaitin2012, Chaitin2018, Abrahao2015, Hernandez-Orozco2018a], such study of systemic properties have not been universally applied to deterministic systems. For this purpose, it leaded us to algorithmic networks, as we will explain below.
Creation may be the result of transformations conducted by spontaneous and autonomous activities, or from transformations conducted by constitutive and predetermined activities of elements of the system (and eventually boundary elements). It may be a new product, or be the result of a process of organizational transformation characterized by the formation of new structures or new functionings. In both cases, creation may be thought of as the emergence of a system.
The process of evolution is characterized as the sequence of states of equilibrium and disequilibrium, manifested in the succession of distinct organizations which arise through the course of transformation of a system. If every organization that arises is considered a novelty, then one can affirm that evolution is a sequence of organizational innovations that may be rightly referred to as creative evolution. In this way, metabiology [Chaitin2012, Chaitin2018, Abrahao2015, Hernandez-Orozco2018a] gave a way to formalize such abstract notions of system, functionality, emergence, and creation in the context of evolutionary systems, whether sub-computable, computable or hyper-computable ones. In particular, such models show how sole sub-computable, computable, or hypercomputable systems can become more emergently creative (associated to an irreducible increase in the algorithmic information of the system’s behavior) under successive random algorithmic mutations and selection of the fittest, which in turn make the environment define the teleological non-intrinsic functionality of the systems as being the increase of the fitness.
Following this same teleology, we showed that one can indeed formally capture the above notions for non-evolutionary systems, giving rise to a formal theory of algorithmic networks [Abrahao2017published]. For example: the model’s (non-intrinsic) functionality in [Abrahao2017published] is to increase the average fitnesses of the nodes through the Busy Beaver imitation game (BBIG) at the expense of communication rounds; and the model’s (non-intrinsic) functionality in [Abrahao2018publishedAMS] is to increase the average fitnesses of the nodes through the Busy Beaver imitation game (BBIG) under a Susceptible-Infected-Susceptible contagion scheme at the expense of communication rounds. Moreover, in this direction, whereas the models in [Abrahao2017published, Abrahao2018publishedAMS] do not formally assign any particular intrinsic common goal to the entire algorithmic network, the present article shows how one can bring the notion of synergistic functionality (translated from statistical-informational measures of synergy in stochastic dynamical systems to networked deterministic systems) to variations of such models.
3. Preliminary definitions and notation
We now restate some main definitions, and notation on which the article results are based. For a complete introduction to these concepts, see [Wehmuth2016b, Abrahao2017published].
3.1. Graphs and networks
MultiAspect Graphs (MAGs) are generalized representations for different types of graphs [Wehmuth2016b, Wehmuth2017]. In particular, a MAG represents dyadic relations between arbitrary -tuples. Since we aim at a wider range of different network configurations, MAGs allow one to mathematically represent abstract aspects that may appear in complex high order networks [Wehmuth2018b]. For example, these may be dynamic (or time-varying) networks, multicolored nodes or edges, multilayer networks, among others. Moreover, this representation facilitates network analysis by showing that their aspects can be isomorphically mapped into a classical directed graph [Wehmuth2016b]. Thus, the MAG abstraction has proved to be crucial in [Abrahao2017published] to establish connections between the characteristics of the network and the properties of the population composed of theoretical machines. Formally,
Definition 3.1**.**
Let be a MultiAspect Graph (MAG), where is the set of existing composite edges of the MAG and is a class (or list) of sets, each of which is an aspect. Each aspect is a finite set and the number of aspects is called the order of . By an immediate convention, we call a MAG with only one aspect as a first order MAG, a MAG with two aspects as a second order MAG and so on. Each composite edge (or arrow) may be denoted by an ordered -tuple , where are elements of the -th aspect with .
denotes the set (or list) of aspects of and denotes the composite edge set of . We denote the -th aspect of as . So, denotes the number of elements in . In order to match the classical graph case, we adopt the convention of calling the elements of the first aspect of a MAG as vertices. Therefore, we denote the set of elements of the first aspect of a MAG as . Thus, a vertex should not be confused with a composite vertex.
Note that, the terms vertex and node may be employed interchangeably in this article. However, we choose to use the term node preferentially in the context of networks, where nodes may realize operations, computations or would have some kind of agency, like in real networks or algorithmic networks. Thus, we choose to use the term vertex preferentially in the mathematical context of graph theory.
Dynamic networks represented by are time-varying graphs (TVGs) as defined in [Costa2015a, Wehmuth2016b]. These are a special case of second order MAGs which have only one additional aspect relative to variation over time in respect to the set of nodes/vertices. Therefore, is the set of nodes, is the set of time instants, and is the set of edges. Formally:
Definition 3.2**.**
Let be a time-varying graph (TVG), where is the set of vertices (or nodes), is the set of time instants, and is the set of edges111 That is, the set of existent (second order) composite edges. .
For the sake of simplifying our notations in the theorems below, one can take a natural ordering for such that
[TABLE]
Definition 3.3**.**
Let be the minimum number of time intervals (non-spatial steps or, specially in the present article, node cycles) for a diffusion starting on vertex at time instant to reach a fraction of vertices in the TVG .
In the case the TVG is connected:
Definition 3.4**.**
Let denote the temporal diffusion diameter of the TVG taking time instant as the starting time instant of the diffusion process. That is,
[TABLE]
Notation 3.1.1**.**
Let denote the binary logarithm .
Definition 3.5**.**
Let
[TABLE]
where
[TABLE]
be a family of unique sized time-varying graphs that shares f(i,t,1)=D(G_{t},t)=\mathbf{O}\big{(}\lg(i)\big{)}, where is the number of nodes, as a common property.
3.2. Formal languages, machines, and algorithmic information theory
Notation 3.1**.**
Let denote the binary representation of the number . In addition, let denote the representation of the number in language . Analogously, let denote the decimal representation of the string , where .
Notation 3.2**.**
Wherever number appears in the domain or in the codomain of a partial (or total) function
[TABLE]
where is a Turing machine, or an oracle Turing machine, running on language , it actually denotes
[TABLE]
Definition 3.6**.**
Let T\colon\begin{array}[t]{c >{{}}c<{{}} c}\,\mathbf{L_{U}}\times\mathbf{L_{U}}&\to&\mathbb{N}\\ \left(\mathrm{M},p\right)&\mapsto&T(\mathrm{M},p)=n\end{array} be the partial recursive function that returns the computation time that machine takes to halt on input .
As in [Abrahao2016, Chaitin2012],
Definition 3.7**.**
Let
[TABLE]
be the total222 Without loss of generality, one can choose a universal self-delimiting programming language in which there is such that and , where . function that returns the largest integer that a program with length can output running on machine . More formally:
[TABLE]
Notation 3.2.1**.**
Let denote an arbitrary recursive bijective pairing function. This notation can be recursively extended to and, then, to an ordered tuple . This iteration can be recursively applied with the purpose of defining finite ordered tuples .
Notation 3.3**.**
Let denote a recursive binary self-delimiting (or prefix-free) universal programming language for a universal Turing machine such that there is a concatenation of strings in the language , which preserves333 For example, by adding a prefix to the entire concatenated string that encodes the number of concatenations. Note that each string was already self-delimiting. See also [Abrahao2016, Abrahao2017published] for more discussions and properties of the notation “”. the self-delimiting (or prefix-free) property of the resulting string, denoted by
[TABLE]
In addition, is a complete binary code with
[TABLE]
The reader may also note that this self-delimiting-preserving concatenation “” is just one example of recursive pairing bijective function . In addition, choosing between two distinct recursive pairing bijective functions and , can only affect the algorithmic complexity444 See Definition 3.10. by
[TABLE]
Therefore, the reader may equivalently replace555 Along with the appropriate re-interpretation of what is prefixes or suffixes in language .
[TABLE]
with
[TABLE]
in the present article without affecting the final result.
Definition 3.8**.**
Let be any program of that computes a partial recursive function such that
[TABLE]
where function holds as in Definition 3.6.
Definition 3.9**.**
Let be any program of that computes a partial recursive function such that
[TABLE]
Definition 3.10**.**
The (unconditional) prefix algorithmic complexity (also known as self-delimiting program-size complexity or Solomonoff-Komogorov-Chaitin complexity) of a finite binary string , denoted by , is the length of the shortest program such that .666 denotes the lexicographically first such that is minimum and . The conditional prefix algorithmic complexity of a binary finite string given a binary finite string , denoted by , is the length of the shortest program such that . Note that , where is the empty string. Similarly, we have the joint prefix algorithmic complexity of strings and defined by or , the prefix algorithmic complexity of information in about denoted by , and the mutual algorithmic information of the two strings and denoted by .
Now, since we will be dealing with a population of randomnly generated arbitrary programs (i.e., Turing machines) in Section 5, we need to define a theoretical machine that can return fitness values for both halting and non-halting programs:
Definition 3.11**.**
Let be the recursive binary self-delimiting universal programming language (as in Notation 3.3) for a universal Turing machine , where there is a constant , with , and a constant such that, for every ,
[TABLE]
We define an oracle777 Or any hypercomputer with a respective Turing degree higher than or equal to . Turing machine such that, for every ,
[TABLE]
Note that, since is self-delimiting and , we have that the operator actually means the successor operator in an arbitrary recursive enumeration of language . In the same manner, we have that actually means .
Thus, the oracle Turing machine in Definition 3.11 is basically (except for a trivial bijection) the same as the chosen universal Turing machine. The oracle is only triggered to know whether the program halts or not in first place.
4. Previous work on algorithmic networks
In this section, we remember the previous work [Abrahao2017published, Abrahao2018publishedAMS, Abrahao2018c, Abrahao2016b] on which this article is based.
4.1. Algorithmic networks
We remember here the general definition of algorithmic networks in [Abrahao2017published]. It is a triple defined upon a population of theoretical machines , a generalized graph , and a function that makes aspects of to correspond to properties of , so that a node in is mapped one-to-one to an element of . Formally:
Definition 4.1**.**
We define an algorithmic network upon a population of theoretical machines , a MultiAspect Graph and a function that causes aspects of to be mapped888 See Definition 4.1.1 . into properties of , so that a vertex in corresponds one-to-one to a theoretical machine in and the communication channels through which nodes can send or receive information from its neighbors are defined precisely by composite edges (or, for directed graphs, composite arrows) in .
We define a population as an ordered sequence999 In which repetitions are allowed. , where is the support set of the population and is a labeling surjective function
[TABLE]
where is the language on which the chosen theoretical machine are running. Each member of this population may receive inputs and return outputs through communication channels. A communication channel between a pair of elements from is defined in by a composite edge (whether directed or undirected) linking this pair of nodes/programs.
Third, we define function as
Definition 4.1.1**.**
Let
[TABLE]
be a function that maps a subspace of aspects in into a subspace of properties in the set of properties of the respective population such that there is a bijective function such that, for every with ,
[TABLE]
where is a vertex (or node) and is an element of the sequence/population .
We say an element is networked iff there is such that is running as a node of , where is non-empty101010 That is, there must be at least one composite edge connecting two elements of the algorithmic network. . We say is isolated otherwise. That is, it is only functioning as an element of and not as a node of . We say that an input is a network input iff it is the only external source of information every node/program receives and it is given to every node/program before the algorithmic network begins any computation. Note that letter may also appear across the text as denoting an arbitrary element of a language. It will be specified in the assumptions before it appears or in statement of the respective definition, lemma, theorem or corollary.
Definition 4.2**.**
A node cycle in a population is defined as a node/program returning an output (which, depending on the language and the theoretical machine the nodes are running on, is equivalent to a node completing a halting computation).
- (1)
If this node cycle is not the last node cycle, then its respective output is called a partial output, and this partial output is shared (or not, which depends on whether the population is networked or isolated) with the node’s neighbors, accordingly to a specific information-sharing protocol (if any); 2. (2)
If this node cycle is the last one, then this output is called a final output such that no more information is shared through the network; 3. (3)
If every node/program in has completed its last node cycle, returning its final outputs, and the population is running networked by an algorithmic network , then we say the algorithmic network as a whole completed an algorithmic network cycle.
In addition, let
[TABLE]
be the set of the maximum number of node cycles that any node/program in the population would be able to perform in order to return a final output, where is the set of all node cycles that node/program can perform.
In the particular case a population is defined on the language and machine we assume the notation:
Definition 4.3**.**
Let be a program such that computes on machine cycle-by-cycle what a node/program does on machine until cycle when networked by . Let be a program such that computes on machine cycle-by-cycle what a node/program does on machine until cycle when isolated. Let be the partial output sent by node/program at the end of cycle . Also, denotes the final output of the node/program .
Definition 4.4**.**
Let be the set of incoming neighbors of node that have sent partial outputs to it at the end of the cycle when running on algorithmic networked . Let be the set of partial outputs relative to .
4.2. Busy Beaver imitation game
In [Abrahao2017published], we have narrowed our theoretical approach by defining a class of algorithmic networks —also denoted by a triplet as —in which their populations and TVGs have determined properties.
As defined in Section 4.1, each element of the population corresponds one-to-one to a node/vertex in and each time instant in is mapped to a cycle (or communication round). These mappings are defined by the function .
The population is composed of randomly generated Turing machines (or randomly generated self-delimiting programs) which are represented in a self-delimiting universal programming language . This population is synchronous with respect to halting cycles, that is, in the end of a cycle (or communication round, as in distributed computing) every node returns its outputs at the same time. Nodes that do not halt in any cycle always return as final output the lowest fitness, that is, the integer value [math]. Here, a straightforward interpretation is that nodes that eventually do not halt in a cycle are “killed”, so that their final output has the “worst” fitness. Thus, these nodes are programs that ultimately run on an oracle Turing machine (or a hypercomputable system) —this requirement is also analogous to the one in [Abrahao2016, Chaitin2014, Hernandez-Orozco2018], which deal with a sole program at the time and not with a population of them. However, the oracle is only necessary to deal with the non-halting computations. That is, behaves like an universal Turing machine except that it returns zero whenever a non-halting computation occur.
In addition, the networked population follows an imitation-of-the-fittest protocol (IFP), diffusing the information of the fittest randomly generated node (i.e., the node that partially outputs the largest integer in cycle )111111 As in [Abrahao2016b, Chaitin2014, Chaitin2018, Hernandez-Orozco2018], note that we still use the Busy Beaver function as a complexity measure for fitness. Therefore, the largest integer directly represents the fittest final output of a node.. Thus, every node in obeys the IFP, in which after the first cycle (i.e., after the first round of partial outputs) every node only imitates the neighbor that has partially output the largest integer, repeating this value as its own partial output in the next cycle. Thus, the main idea defining the IFP is a procedure in which each node compares its neighbors’ partial output (that is, the integer they have calculated in the respective cycle) and runs the program of the neighbor that have output the largest integer if, and only if, this integer is larger than the one that the very node has output. Since is playing the Busy Beaver game [Abrahao2017published] on a network while limited to simple imitation performed by a randomly generated population of programs, we say it is playing a Busy Beaver Imitation Game (BBIG). A (network) Busy Beaver game [Abrahao2017published] is a game in which each player is trying to calculate the largest integer—as established as our measure of fitness or payoff121212 See [Abrahao2017published, Abrahao2016, Hernandez-Orozco2018] for more discussions.—it can using the information shared by its neighbors. Thus, the BBIG is a special case of the Busy Beaver game.
Thus, is a synchronous algorithmic network populated by randomly generated nodes such that, after the first cycle (or arbitrary cycles), it starts a diffusion process of the biggest partial output (given at the end of the first cycle) determined by network : at the first time instant each node may receive a network input , which is given to every node in the network, and runs separately (i.e., not networked), returning its respective first partial output; then, the plain diffusion of large integers starts as determined by the IFP through the respective dynamical network . At the last time instant contagion stops and one cycle (or more) is spent in order to make each node to return a final output. Formally,
Definition 4.5**.**
Let
[TABLE]
be an algorithmic network, where is an arbitrary well-defined function such that
[TABLE]
and , , , and there are arbitrarily chosen131313 Since they are arbitrarily chosen, one may choose to take them as minimum as possible in order to minimize the number of cycles for example. That is, and for example. with such that is an injective function, where
[TABLE]
Since the way time instants are mapped into cycles is fixed given values of and , we may equivalently denote function as
[TABLE]
4.3. Background results
Following an algorithmic approach to evolutionary open-endedness (EvoOE), we have found in [Abrahao2017published] that open-endedness may also emerge as an akin, but different, phenomenon to EvoOE: Instead of achieving an unbounded quantity of algorithmic complexity over time (e.g., after successive mutations), an unbounded quantity of emergent algorithmic complexity is achieved as the population/network size increases indefinitely. The algorithmic complexity of a node/program’s final output when networked minus the algorithmic complexity of a node/program’s final output when isolated formally defines an irreducible quantity of information that emerges in respect to a node/program that belongs to an algorithmic network. We call it as emergent algorithmic complexity (EAC) of a node/program. The reader may also find more discussions on emergence and open-endedness in [Abrahao2017published, Abrahao2018publishedAMS].
Formally, we have defined average emergent open-endedness in the context of general algorithmic networks as
Definition 4.6**.**
We say an algorithmic network with a population of nodes has the property of average (local) emergent open-endedness (AEOE) for a given network input in cycles iff
[TABLE]
And, in the case of an algorithmic network with randomly generated nodes, we call this property as expected (local) emergent open-endedness. We have that
[TABLE]
denotes the average emergent algorithmic complexity of a node/program (AEAC) in an algorithmic network with network input . In addition:
Definition 4.7**.**
The emergent algorithmic complexity (EAC) of a node/program in cycles is given in an algorithmic network that always produces partial and final outputs by
[TABLE]
where:
- (1)
; 2. (2)
represents the program that returns the final output of when networked assuming the position , where , in the MAG in the specified number of node cycles with network input ; 3. (3)
represents the program that returns the final output of when isolated in the specified number of node cycles with network input ;
Note that the network input was omitted in and , as presented in [Abrahao2017published, Abrahao2018publishedAMS]. This is because in the models in [Abrahao2017published, Abrahao2018publishedAMS] we were focusing on lower bounds on the expected emergent algorithmic complexity of a node and it was achieved by estimating the occurrence of fittest node/programs that ignores its inputs. However, for the following results in this article, we may equivalently denote as and as .
In [Abrahao2017published], we showed that there is a lower bound for the expected emergent algorithmic complexity in algorithmic networks such that it depends on how much larger is the average diffusion density (in a given time interval) compared to the cycle-bounded conditional halting probability . Formally:
Theorem 4.1**.**
Let be a network input. Let . Let be well-defined. Let . Let
[TABLE]
be a total computable function where . Then, we will have that:
[TABLE]
This lower bound also depends on the parameter for which one is calculating the number of node cycles. In fact, we have proved that our results hold even in the case of spending a computably larger number of node cycles compared to . Furthermore, we have proved that the small-diameter phenomenon is a condition that ensure that there is a central time to trigger expected emergent open-endedness. Formally:
Theorem 4.2**.**
Let be a network input. Let . If there exist and such that
[TABLE]
where
[TABLE]
Then, for every non-decreasing total computable function c\colon\begin{array}[t]{c >{{}}c<{{}} c}\mathbb{N}&\to&\mathfrak{C_{BB}}\\ x&\mapsto&c(x)=y\end{array}, where and and is well-defined, we will have that there is such that .
Our proofs follow mainly from information theory, computability theory, and graph theory. Therefore, we have shown that there are topological conditions (e.g., the small-diameter phenomenon) that trigger a phase transition in which eventually the algorithmic network begins to produce an unlimited amount of bits of average local emergent algorithmic complexity/information. These conditions come from a positive trade-off between the average diffusion density and the number of cycles (i.e., communication rounds). Thus, the diffusion power of a dynamic (or static) network has proved to be paramount with the purpose of optimizing the average fitness/payoff of an algorithmic network that plays the Busy Beaver imitation game in a randomly generated population of Turing machines.
5. A model of algorithmic network for synergistically solving a common problem
In this section, we present the model of algorithmic networks on which we will prove lemmas and theorems. In this way, we focus on the description and the definition of the new model.
The main idea that defines the algorithmic networks is to formalize a distinct information-sharing (or communication) protocol that is based on the BBIG, so that nodes can use the largest integer the other nodes are calculating to nourish a global procedure in order to compute a function. So, this is a modification of the algorithmic networks in [Abrahao2017published]. The latter only follow the IFP with a plain spreading of the largest integer, as described in Section 4.2. In particular, with respect to in [Abrahao2017published], we will modify the functioning of the IFP in the last node cycle, just in the moment each node is about to return its final output. First, we will describe the properties of that are common to . Then, we will describe the functioning of the synergistic imitation-of-the-fittest protocol (SIFP) that is different from the IFP in .
As in [Abrahao2017published], we pursue overarching mathematical theorems, so we choose to deal with time-varying directed graphs [Pan2011, Wehmuth2015a]. Note that the static case is covered by a particular case of dynamical networks in which the topology does not change over time. And the undirected case can be seen as a graph in which each undirected edge (or line) represents two opposing arrows. As defined in Section 3.1, are time-varying graphs (TVGs) as in [Costa2015a, Wehmuth2016b]. These are special cases of MAGs which have only one additional aspect relative to variation over time, besides the set of vertices.
As , the algorithmic networks , which we will define in Definition 5.4, get their graph topologies from a family of dynamical networks that has a certain diffusion measure as a common feature, in particular, a small diameter compared to the network size (see Definition 3.4). In Definition 3.5, we define as a family of unique sized time-varying graphs which shares
[TABLE]
as a common property, where is the number of nodes and is the temporal diffusion diameter.
Moreover, as , the populations of nodes/programs in Definition 5.3 of are composed of randomly generated prefix Turing machines (or randomly generated self-delimiting programs) that are represented in a self-delimiting universal programming language . These populations are also synchronous with respect to halting cycles, that is, in the end of a cycle (or communication round, as in distributed computing) every node returns its partial and final outputs at the same time. Nodes that do not halt in any cycle always return as final output the lowest fitness/payoff, that is, the integer value [math]. Here, a straightforward interpretation is that nodes that eventually do not halt in a cycle are “killed”141414 See also [Abrahao2017published, Chaitin2012, Chaitin2013] for a complete evolutionary formalization of this property. Note that now there is a population of software, while in[Chaitin2012, Chaitin2013] there is only one single organism at the time. , so that their final output has the “worst” fitness/payoff.
Now, unlike the networked population in [Abrahao2017published], described in Section 4.2, the networked population follows a modified version of the IFP . Instead of just returning as final output the largest integer shared by the neighbors, the SIFP ensures that, at the last cycle, every node employs the network input together with the largest integer shared by its neighbors to calculate a partial computable function in such a way that every node returns as final output the value of
[TABLE]
where was the latest largest integer shared through the network. Note that this procedure is a global information-sharing (or communication) protocol. In summary, the SIFP makes every node obeys a protocol such that, after the first node cycles (i.e., after the first rounds of isolated partial outputs), every node is obliged to always imitate the neighbor (or itself, if the very node has partially output the largest integer in comparison with its neighbors’ partial outputs) that has partially output the largest integer , repeating this last value as its own partial output in the next node cycle. Since we are dealing only with synchronous algorithmic networks, these global communication protocols apply at the end of each node cycle (or communication round). Finally, the last node cycle is spent in order to cause each node to only return a final output in the form . Thus, the SIFP is formally defined as:
Definition 5.1**.**
Let . We say a population follows a (global) synergistic imitation-of-the-fittest protocol (SIFP) for program iff every networked node/program always obeys the procedure:
- (I)
for every and ,
- (a)
if , then
[TABLE] 2. (b)
if and , then
[TABLE] 3. (c)
if and , then
[TABLE]
where
[TABLE] 4. (d)
if and , then
[TABLE]
It is important to remark that Definition 5.1 only applies to the networked case. If the population is isolated, no node can communicate with others. Therefore, as we will formalize in Definition 5.2, no protocol applies in the isolated case. Formally:
Definition 5.2**.**
Let be a language of programs in the form where . The prefix is any program that always ensures that, if the node/program is networked and running on , then obeys the synergistic imitation-of-the-fittest protocol as in Definition 5.1 for some arbitrary program . Otherwise, if the node/program is isolated and running on , then, for every , and every subsequent node cycle works like a reiteration of partial outputs as immediate respective next inputs for the same program .
Note that the isolated case may be equivalently represented by an algorithmic network built on a population of that does not follow any information-sharing protocol and the topology of the MultiAspect Graph (MAG) is composed by one-step self-loops on each node only.
From Definition 5.2, we can now formalize the population :
Definition 5.3**.**
Let be the same population in [Abrahao2017published], except for using the language as the support set instead of .
Thus, we can now formally define the studied model of algorithmic networks as a modification of the algorithmic networks in [Abrahao2017published]. In summary, is a synchronous algorithmic network populated by randomly generated nodes (i.e., programs) such that, after the first cycle (or arbitrary cycles), it starts a diffusion process of the biggest partial output (given at the end of the first cycle) determined by the network topology of the TVG . More specifically: before the first cycle each node receives a network input , which is given to every node in the network; then, before the first network time interval, one or cycles are spent in which each node runs separately, repeating its respective first partial output that will be shared; from then on, as determined by the SIFP in Definition 5.1, the plain diffusion of larger integers starts through the respective dynamical network , so that, at each time interval, the SIFP ensures that a fitter node always “infects” its immediate less fit neighbors; finally, at the last time instant, contagion stops and one cycle (or more) is spent in order to make each node return
[TABLE]
where was the latest largest integer shared by the neighbors at the previous node cycle, as final output. This way, we formally define:
Definition 5.4**.**
Let denote exactly the same algorithmic network in [Abrahao2017published] (see Definition 4.5), except for replacing population with .
6. Solving the halting problem through the Busy Beaver imitation game
In this section we will prove lemmas, theorems, and corollaries with the purpose of showing that are algorithmic networks capable of asymptotically solving the halting problem for every network input with , where is the network size (i.e., the number of nodes) and is a fixed constant that does only depend on the chosen universal programming language. Moreover, since are dynamic networks whose functioning is based on the diffusion of the fittest node, we will also show that there is a central node for emergently solving the above halting problem in order that the number of necessary communication rounds are minimized. In particular, this node is associated with the highest time-reachability centrality among the nodes.
From [Abrahao2017published, Abrahao2018publishedAMS], it is important to remember that a network Busy Beaver game is a game in which each player is trying to calculate the largest integer it can using the information shared by its neighbors. For the present purposes, as in [Abrahao2017published], the population in the studied algorithmic networks , which is playing a particular type of network Busy Beaver game, is in fact limited to simple imitation performed by a randomly generated population of programs. This Busy Beaver imitation game (BBIG) [Abrahao2017published] is a particular case of the Busy Beaver game in which every node can only propagate the largest integer. It configures a simple imitation-of-the-fittest procedure. However, unlike [Abrahao2017published, Abrahao2018publishedAMS], the last node cycle is devoted to employ this diffusion of the fittest to solve a problem that can be partially computed by program . Although partial or final outputs are always defined in algorithmic networks due to the oracle-sensitiveness property [Abrahao2017published] of the population , these outputs may not match every function value for every input in some cases. For example, our central results in Theorem 6.1 demands a restriction on the domain of possible network inputs in order that every input in this domain generates the correct function value. Nevertheless, in the limit when the population grows indefinitely, one can say that an infinite family of algorithmic networks make every node asymptotically compute a total function (in the case, the very characteristic function of the halting problem).
While algorithmic networks in [Abrahao2017published] can be seen as playing an optimization procedure where the whole pursues the increase of the average fitness/payoff through diffusing on the network the best randomly generated solution, these algorithmic networks can be seen as playing an optimization procedure where the whole pursues the increase of each node’s capability of solving a common problem through diffusing on the network the best randomly generated solution in the smallest number of communication rounds as possible [Abrahao2017published]. Thus, the nodes in algorithmic networks may be seen as competing with each other, as in multi-agent systems from a game-theoretical approach [Abrahao2017published, Abrahao2018publishedAMS, Abrahao2016b]; on the other hand, the present model may be seen as nodes/programs computing using network’s shared information to solve a common purpose, as the classical approach in distributed computing. In this sense, this addition of perspective in such models of algorithmic networks is bridging a competition or individualistic-centered view of emergence to a synergistic-centered view of emergence. In Section 7 we will define and explore such synergy in algorithmic networks.
First, we define a total computable function that is capable of deciding whether a program halts or not if the respective large enough computation time is informed as input.
Definition 6.1**.**
Let be a program of that computes a total recursive function such that
[TABLE]
Note that in Definition 6.1 always computes a total function for every and every because can either halt or not halt on any program in computation time. As a consequence:
Lemma 6.1**.**
Let , where . Then, for every ,
[TABLE]
Proof.
The proof follows directly from Definition 6.1. Since can only halt or, exclusively, not halt on , we divide the proof in two cases:
- (1)
If the universal machine halts on , then the value will be well defined. Hence, the value will also be well defined. Therefore, from Definition 3.7, we will have that . Then,
[TABLE] 2. (2)
If the universal machine does not halt on , then the value will not be well defined. Therefore, for every , we will have . Then, .
∎
Now, from [Abrahao2017published], we translate its first lemma to the new algorithmic network model , showing how to harness the implications of the law of large numbers in a program-size probability distribution [Abrahao2017published]:
Lemma 6.2**.**
Let be an algorithmic network as in Definition 5.4. Then, with probability arbitrarily close to 1 as increases toward infinity, we will have that there are constants and such that
[TABLE]
where
[TABLE]
, and is the network input. In addition, for every with , , and , we will have
[TABLE]
Proof.
This proof of is totally analogous to the proof of Lemma 5.1 in [Abrahao2017published]. Just note that, from Definition 5.4 and [Abrahao2017published], we also have that denotes the same algorithmic network in [Abrahao2017published], except for replacing family with family . In addition, from Definition 5.1 and the definition of in [Abrahao2017published], we have that , which follows from the fact that the IFP in [Abrahao2017published] is only different from the SIFP in just ensuring that nodes returns instead of in the last cycle as the final output. To show the second part of the Lemma 6.2 that , it suffices to note that, from Definition 5.1, every networked node only imitates the fittest neighbor after the first node cycle. Thus, since , we have from Definition 3.5 that the number of node cycles will cover the temporal diffusion diameter and it will be enough to make any fittest first partial output, which is at least as fit as , propagate to every other node. ∎
Thus, we can now combine the previous results to build a new theorem. Theorem 6.1 basically asymptotically assures that, if enough communication rounds are expended, matching the network temporal diffusion diameter (which is small compared to the population/network size), then every node is expected to solve the halting problem for program with length dominated by a logarithmic order of the population/network size. Therefore, increasing the population/network size can make such algorithmic networks emergently solve a increasing number of instances of the halting problem such that, in the limit when the population grows indefinitely, all instances of the halting problem will be statistically covered.
Theorem 6.1**.**
Let be an algorithmic network as in Definition 5.4 such that is well defined. Let
[TABLE]
be a non-decreasing total computable function such that
[TABLE]
where . Then, there is a constant such that, for large enough and for every network input with , we will have that every node in decides whether halts or not on in node cycles with probability arbitrarily close to .
Proof.
This proof follows from combining Lemma 6.2 with Lemma 6.1. We have from Clause 5.1(I)d in Definition 5.1 that, for every ,
[TABLE]
where . In addition, we have from Lemma 6.2 that holds with probability arbitrarily close to as the population size tends to infinity. Let be a constant such that
[TABLE]
Therefore, for every network input with , we will have that and, hence, from Lemma 6.1, that
[TABLE]
∎
Theorem 6.1 looks at all nodes in the algorithmic network. However, one may extract from this result the presence of privileged nodes in solving the respective halting problem. To this end, we first define a general node centrality for distributed processing:
Definition 6.2**.**
Let be a well-defined algorithmic network in the language and machine . We define a central node that can compute a function
[TABLE]
in the minimum number of node cycles when networked and that does not compute this function when isolated in node cycles iff for every , the final output of the networked node hold as
[TABLE]
such that, for every ,
[TABLE]
and, for every , if
[TABLE]
and
[TABLE]
then .
Now, we come back to node centralities in network science in order to rank the node that can be quickly accessible by an arbitrary diffusion from an arbitrary fraction of the nodes. From [Costa2015a]:
Definition 6.3**.**
Let be the minimum number of time intervals (non-spatial steps or, specially in the present article, node cycles) for a diffusion starting on any vertex of a fraction of vertices in the TVG at time instant to reach vertice , where is arbitrary.
In this sense, if one consider any possible fraction of nodes at the same time, we can define a node centrality based on the temporal diffusion diameter [Abrahao2017published]:
Definition 6.4**.**
Let be a TVG with and . We define the time-reachability centrality of a vertex in the TVG from time instant as
[TABLE]
In addition:
Definition 6.4.1**.**
We define the set of the vertices with time-reachability centrality in a TVG from time instant as
[TABLE]
Note that the condition immediately assures that Definitions 6.4 and 6.4.1 are well defined.
Finally, Theorem 6.1 implies that the node centrality for distributed processing and the node centrality for complex network’s dynamics can be combined to find a node that can only solve the halting problem when networked in the least amount of communication rounds (i.e., node cycles):
Corollary 6.1.1**.**
Let be an algorithmic network as in Definition 5.4 such that is well defined. Let c\colon\begin{array}[t]{c >{{}}c<{{}} c}\mathbb{N}&\to&\mathfrak{C_{BB}}\\ x&\mapsto&c(x)=y\end{array} be a non-decreasing total computable function with and . Then, there is a constant such that, for large enough and for every network input with , there is at least one central node (as in Definition 6.2) with the respective highest time-reachability centrality (as in Definition 6.4) in the algorithmic network that decides whether halts or not on in node cycles with probability arbitrarily close to .
Proof.
This proof follows from Theorem 6.1. Since , we will have that Definition 6.4 is well defined for every vertex in the TVG . Now, we take a vertex with the highest time-reachability centrality as in Definition 6.4 such that, for every ,
[TABLE]
where and . Thus, we trim the necessary latest time instants in the set of time instant in order that one can define another TVG such that
[TABLE]
and, for every , one has . Then, we replace in with . Note that, from Definition 3.5, we have that
[TABLE]
Now, we take any non-decreasing total computable function c\colon\begin{array}[t]{c >{{}}c<{{}} c}\mathbb{N}&\to&\mathfrak{C_{BB}}\\ x&\mapsto&c(x)=y\end{array} such that and
[TABLE]
Therefore, from Definition 5.4, there is a correspondent node that assumes position of the vertex such that, from Theorem 6.1,
[TABLE]
∎
7. Algorithmic synergy
Following the same spirit from the emergent algorithmic complexity of a node introduced in [Abrahao2017published, Abrahao2018publishedAMS], another interesting topic is whether algorithmic networks and algorithmic information theory are sufficient to deal with the problem of measuring synergistic information [Lizier2018, Griffith2014a] or not. This problem is usually stated within the context of multivariate information theory for stochastic dynamical systems. Generally speaking, it concerns measuring the amount of information in an arbitrary collection of random variables that predicts another random variable , but that it is not contained in (or does not derive from) any individual random variable , where , or from combinations of proper subsets of the set , which is given by the partial information diagrams (i.e., PI-diagrams) [Griffith2014a].
On the other hand, from [Abrahao2017published, Abrahao2018publishedAMS], note that emergent algorithmic complexity directly gives a formal measure of irreducible information [Li1997, Calude2002, Chaitin2004, Calude2009, Grunwald2008] that emerges when comparing the networked case with the isolated case. Thus, if one assumes the definition of synergy as the general phenomenon in which the whole system is irreducibly better in solving a common problem than the “sum” (or the “union”) of its parts taken separately, as the problem described in the previous paragraph, then there should be an immediate extension of emergent algorithmic complexity to algorithmic synergistic information.
To tackle this problem, we introduce in this section a formalization of one type of algorithmic synergistic information in the context of algorithmic networks. Thus, instead of studying synergy in stochastic processes, we will be studying synergy in deterministic systems, in particular, in networks of computable systems. That is, we are focusing on the general problem of measuring the amount of algorithmic information in an arbitrary collection of nodes necessary to calculate a function, but that could not be performed by any individual isolated node or by any combination of proper subnetworks of the entire network. In particular, we start by formalizing a measure of average algorithmic synergistic information for individual nodes when comparing the fully networked case with the isolated case. Thus, we leave the joint cases and subnetwork cases for future research.
Definition 7.1**.**
Let be a well-defined algorithmic network. We define the local algorithmic synergy of a node toward function
[TABLE]
in node cycles with network input as
[TABLE]
where:
- (1)
; 2. (2)
represents the program that returns the final output of when networked assuming the position , where , in the MAG in the specified number of node cycles with network input ; 3. (3)
represents the program that returns the final output of when isolated in the specified number of node cycles with network input ;
The reader may find tempting to employ instead of in Definition 7.1 due to the fact that is invertible and is not (see Definition 3.10 and [Li1997, Downey2010, Chaitin2004]). In this sense, note that, since , we will have that
[TABLE]
However, besides the outputs processed by the algorithmic network be in the form and not , the non-invertibility of actually captures the notion of towardness in computing the function . For example, the algorithmic network may be emergently generating the necessary information to compute function at the same time that function does not give the necessary information to determine the emergent behavior of the algorithmic network. This way, the non-invertibility would be a sound property. In fact, investigating the cases in which the measure in Definition 7.1 is invertible is an interesting future research.
Moreover, a constant represented by in the Definition 7.1 is employed in order to deal with some non intuitively correct cases that may appear depending on the chosen universal programming language. For example, when
[TABLE]
with or when
[TABLE]
with . Thus, there may be “blurred intervals” with respect to algorithmic synergy, so that one cannot decide whether there is a positive value of local algorithmic synergy or not. In fact, as odd as it may seem, it is in consonance with the equalities and inequalities in algorithmic information theory that hold, except for a constant that only depends on the chosen universal programming language. Note that these complexity/information oscillations are expected to happen in algorithmic information theory.
In order to specify the type of algorithmic network from which one is calculating the local algorithmic synergy, we may also denote by
[TABLE]
or
[TABLE]
Thus, for the current studied model:
Definition 7.1.1**.**
We denote the local algorithmic synergy of a node in an algorithmic network toward function
[TABLE]
in node cycles with network input as
[TABLE]
where:
- (1)
; 2. (2)
represents the program that returns the final output of when networked assuming the position , where , in the TVG in node cycles with network input ; 3. (3)
represents the program that returns the final output of when isolated in node cycles with network input ;
Furthermore, for a fixed function , one can define the average value of local algorithmic synergy:
Definition 7.2**.**
Let be a well-defined algorithmic network. We define the average local algorithmic synergy of a node toward function
[TABLE]
in node cycles with network input as
[TABLE]
Then, since population is randomly generated, one can define the expected local value of the algorithmic synergy of a node for a fixed function in the current studied case:
Definition 7.2.1**.**
We define the expected local algorithmic synergy of a node toward function
[TABLE]
in node cycles (or communication rounds) with network input as
[TABLE]
Now, we can combine the results from Section 6 with the definition of expected local algorithmic synergy in order to make it as large as one may want:
Theorem 7.1**.**
Let
[TABLE]
be a function defined on arbitrary . Let be an algorithmic network as in Definition 5.4. Let
[TABLE]
be a non-decreasing total computable function such that
[TABLE]
*where . Let be an arbitrary number. Then, there are constant and such that, for large enough and for every network input with , we will have that the expected local algorithmic synergy of a node in algorithmic network toward solving151515 That is, toward
\begin{array}[]{cccl}f_{h}\;:&\left\{x\,\middle|\,\lg(N)-C_{7}\geq\left|x\right|\right\}\subset\mathbf{L_{U}}&\to&\{h,\overline{h}\}\subset\mathbf{L_{U}}\\ &x&\mapsto&f_{h}(x)=\begin{cases}\overline{h}&\textnormal{\small, if \mathbf{U}x}\\ h&\textnormal{\small, if\mathbf{U}x}\end{cases}\end{array}
the halting problem with domain*
[TABLE]
in node cycles is larger than , i.e.,
[TABLE]
with probability arbitrarily close to .
Proof.
This proof follows from a combination of Theorem 6.1 with Definition 7.2.1 for a sufficiently complex . First, we know from Definition 7.2.1 and Equation (1) that
[TABLE]
Let denote the element of language in which its length is the minimum length larger than zero. From Definition 5.3, we know every node belongs to . Therefore, since there always are randomly generated nodes that ignore any input and keep returning as output in any node cycle when running isolated, then, from Definitions 3.3, 3.10, and 5.3 and the law of large numbers, there is such that
[TABLE]
Now, choose with and as in Definition 6.1, where161616 The number is not really necessary here. We chose to employ it in order to avoid minor ambiguities in the asymptotic dominance.
[TABLE]
Furthermore, from Theorem 6.1, there is a constant such that, for large enough and for every network input with , we have that, for every node ,
[TABLE]
in node cycles with probability arbitrarily close to . Therefore, from Equations (2), (3) and (4), we will have that there is a constant such that, for every network input with ,
[TABLE]
holds with probability arbitrarily close to . ∎
Indeed, for some universal programming laguages and classical labelings on halting computation and non-halting computation, e.g., and [math], respectively, the expected local algorithmic synergy of a node may not be positive. What Theorem 7.1 assures is that, for any chosen universal self-delimited programming language and any arbitrarily chosen , there are and that can univocally represent the halting case and the non-halting case, respectively, and that the expected local algorithmic synergy of a node becomes larger than .
8. Conclusion and future work
We have studied a particular model of algorithmic networks . These are composed of randomly generated self-delimiting programs as nodes, which share information accordingly to the synergistic imitation-of-the-fittest protocol (SIFP). From this model, we studied how to make the nodes asymptotically solve the halting problem as the population grows indefinitely. In this way, we have shown how a fixed global information-sharing (or communication) protocol can exploit the power of random generation of individuals and the power of selection made by an irreducibly more powerful environment in order to solve an uncomputable problem.
To this end, we have modified the model introduced in [Abrahao2017published] to enable each node to calculate a partial recursive function for the network input and the latest largest integer shared by the neighbors. Specifically, this modification was made in the imitation-of-the-fittest protocol (IFP) in [Abrahao2017published].
First, we proved that, if the population/network size is large enough, the network diameter is small compared to the population/network size, and enough communication rounds (i.e., node cycles) are expended (in particular, matching the network diameter), then every node is expected to solve the halting problem for any program with length dominated by a logarithmic order of the population/network size. In other words, nodes can emergently solve an increasing number of instances of the halting problem as the population grows indefinitely. This way, for algorithmic networks , all instances of the halting problem are statistically covered in the limit when the population grows indefinitely. This result shows that there is at least one fixed algorithm that can be distributedly run on networked randomly generated universal Turing machines, so that the entire algorithmic network can compute a function in the Turing degree , if the population/network size is large enough. Therefore, besides computation time and memory, networked randomly generated environment-evaluable nodes (which we may call o-nodes) can be regarded as a third type of computational resource. Thus, for algorithmic networks , any set with Turing degree are indeed decidable if enough (but still finite) time, memory, and o-nodes are given. As already studied for time hierarchies and space hierarchies, we propose as future research the investigation of o-node hierarchies. Furthermore, we also propose the investigation of resource-bounded versions of our present results, for example, in the case nodes belong to a time complexity class and the environment (i.e., the machine in which each node is being simulated) belongs to sufficiently higher time complexity class.
Secondly, we introduced two types of node centralities, in particular, one for distributed processing and one for network diffusion. From these and from the previous results, we proved that these two centralities can be intrinsically combined to show that, in the previously described conditions, there is one central node that can solve the halting problem in the minimum amount of communication rounds and only if networked. This result may help understand how node centralities in network science may be related to emergently privileged nodes in distributed processing.
Third, we introduced one type of algorithmic-informational measure of synergy, bridging previously studied concepts in multivariate information theory for stochastic processes to algorithmic information theory and algorithmic networks. With this respect, the general problem of synergy in networked computable systems can be translated as the problem of measuring the amount of algorithmic information in an arbitrary collection of nodes strictly necessary to calculate a function, but that could not be obtained by any individual isolated node or by any combination of proper subnetworks of the entire network. Then, narrowing our approach, we defined a measure of average algorithmic synergistic information for individual nodes in the specific case there is a comparison of the totally networked case with the totally isolated case. We call it local algorithmic synergy. Further, we showed that, for any chosen universal self-delimited programming language, one can make the algorithmic networks produce as much expected local algorithmic synergy of a node as one may want. In this way, we related the emergent algorithmic complexity in [Abrahao2017published] to a new type of emergent property, in the case, synergy. Thus, showing how systemic properties commonly studied in complex systems science, such as synergy, can be formalized in the context of networked deterministic systems. Moreover, with respect to synergy and networked computable systems, our results may help unlocking a formalism to find new mathematical phenomena in future work, such as new types of algorithmic measures of synergy for a comparison of the fully networked case with proper subnetworks.
The present article follows a general pursuit of an abstract mathematical theory for systemic properties in complex systems (especially, living systems), such as evolution [Chaitin2012, Hernandez-Orozco2018, Hernandez-Orozco2018a, Chaitin2018, Hernandez-Orozco2018], emergence of complexity [Abrahao2017published, Abrahao2018publishedAMS, Hernandez-Orozco2018], and emergence of creativity [Abrahao2017published, Abrahao2018c, Chaitin2018]. This way, as synergy and centralities presented in this article, the theory of algorithmic networks goes toward the direction of establishing formal theories for other common systemic properties usually attributed to complex systems. In this direction, if one assumes the hypothesis that hypercomputation is possible in Nature, our results in this article have shown how Life might have found a way to synergistically harness the power of selection of individuals in sufficiently random population of individuals, even if every living being remains as a computable system. This way, the present work may also “open the gate” for the study of other systemic properties in future work, such as self-organization and autopoiesis, within the context of distributed deterministic systems.
Acknowledgments
Authors acknowledge the partial support from CNPq through their individual grants: F. S. Abrahão (313.043/2016-7), K. Wehmuth (312599/2016-1), and A. Ziviani (308.729/2015-3). Authors acknowledge the INCT in Data Science – INCT-CiD (CNPq 465.560/2014-8). Authors also acknowledge the partial support from CAPES/STIC-AmSud (18-STIC-07), FAPESP (2015/24493-1), and FAPERJ (E-26/203.046/2017). We also thank the comments and critiques from Hector Zenil and Mikhail Prokopenko.
References
