Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic   Artificial Neural Networks

Andrea Soltoggio; Kenneth O. Stanley; Sebastian Risi

arXiv:1703.10371·cs.NE·August 9, 2018

Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks

Andrea Soltoggio, Kenneth O. Stanley, Sebastian Risi

PDF

TL;DR

This paper reviews the development and progress of Evolved Plastic Artificial Neural Networks (EPANNs), highlighting their potential to autonomously discover adaptive algorithms inspired by biological neural plasticity.

Contribution

It provides a comprehensive overview of EPANNs, discussing their methods, recent progress, and future opportunities in creating flexible, adaptive neural network systems.

Findings

01

EPANNs can autonomously discover novel adaptive algorithms.

02

Recent advances enable more flexible and innovative neural network solutions.

03

EPANNs have shown significant progress over the last two decades.

Abstract

Biological plastic neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifetime learning. The interplay of these elements leads to the emergence of adaptive behavior and intelligence. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed plastic neural networks with a large variety of dynamics, architectures, and plasticity rules: these artificial systems are composed of inputs, outputs, and plastic components that change in response to experiences in an environment. These systems may autonomously discover novel adaptive algorithms, and lead to hypotheses on the emergence of biological adaptation. EPANNs have seen considerable progress over the last two decades. Current scientific and technological advances in artificial neural networks are now…

Equations10

Δ w = f (x, θ),

Δ w = f (x, θ),

x_{i}=\sigma\big{(}\sum(w_{ji}\cdot x_{j})\big{)}\enspace,

x_{i}=\sigma\big{(}\sum(w_{ji}\cdot x_{j})\big{)}\enspace,

Δ w = η \cdot x_{j} \cdot x_{i}

Δ w = η \cdot x_{j} \cdot x_{i}

Δ w = m \cdot f (x, θ),

Δ w = m \cdot f (x, θ),

Δ w = m \cdot (A x_{i} x_{j} + B x_{j} + C x_{i} + D),

Δ w = m \cdot (A x_{i} x_{j} + B x_{j} + C x_{i} + D),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks

Andrea Soltoggio, Kenneth O. Stanley, Sebastian Risi Department of Computer Science, Loughborough University, LE11 3TU, Loughborough, UK, [email protected] of Computer Science, University of Central Florida, Orlando, FL, USA, [email protected] University of Copenhagen, Copenhagen, Denmark, [email protected]

Abstract

Biological neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifelong learning. The interplay of these elements leads to the emergence of biological intelligence. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) employ simulated evolution in-silico to breed plastic neural networks with the aim to autonomously design and create learning systems. EPANN experiments evolve networks that include both innate properties and the ability to change and learn in response to experiences in different environments and problem domains. EPANNs’ aims include autonomously creating learning systems, bootstrapping learning from scratch, recovering performance in unseen conditions, testing the computational advantages of particular neural components, and deriving hypotheses on the emergence of biological learning. Thus, EPANNs may include a large variety of different neuron types and dynamics, network architectures, plasticity rules, and other factors. While EPANNs have seen considerable progress over the last two decades, current scientific and technological advances in artificial neural networks are setting the conditions for radically new approaches and results. Exploiting the increased availability of computational resources and of simulation environments, the often challenging task of hand-designing learning neural networks could be replaced by more autonomous and creative processes. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main methods and results are reviewed. Finally, new opportunities and possible developments are presented.

Index Terms:

Artificial Neural Networks, Lifelong Learning, Plasticity, Evolutionary Computation.

I Introduction

Over the course of millions of years, evolution has led to the emergence of innumerable biological systems, and intelligence itself, crowned by the evolution of the human brain. Evolution, development, and learning are the fundamental processes that underpin biological intelligence. Thus, it is no surprise that scientists have tried to engineer artificial systems to reproduce such phenomena (Sanchez et al., 1996; Sipper et al., 1997; Dawkins, 2003). The fields of artificial intelligence (AI) and artificial life (AL) (Langton, 1997) are inspired by nature and biology in their attempt to create intelligence and forms of life from human-designed computation: the main idea is to abstract the principles from the medium, i.e., biology, and utilize such principles to devise algorithms and devices that reproduce properties of their biological counterparts.

One possible way to design complex and intelligent systems, compatible with our natural and evolutionary history, is to simulate natural evolution in-silico, as in the field of evolutionary computation (Holland, 1975; Eiben and Smith, 2015). Sub-fields of evolutionary computation such as evolutionary robotics (Harvey et al., 1997; Nolfi and Floreano, 2000), learning classifier systems (Lanzi et al., 2003; Butz, 2015), and neuroevolution (Yao, 1999) specifically research algorithms that, by exploiting artificial evolution of physical, computational, and neural models, seek to discover principles behind intelligent and learning systems.

In the past, research in evolutionary computation, particularly in the area of neuroevolution, was predominantly focused on the evolution of static systems or networks with fixed neural weights: evolution was seen as an alternative to learning rules to search for optimal weights in an artificial neural network (ANN). Also, in traditional and deep ANNs, learning is often performed during an initial training phase, so that weights are static when the network is deployed. Recently, however, inspiration has originated more strongly from the fact that intelligence in biological organisms considerably relies on powerful and general learning algorithms, designed by evolution, that are executed during both development and continuously throughout life.

As a consequence, the field of neuroevolution is now progressively moving towards the design and evolution of lifelong learning plastic neural systems, capable of discovering learning principles during evolution, and thereby able to acquire knowledge and skills through the interaction with the environment (Coleman and Blair, 2012). This paper reviews and organizes the field that studies evolved plastic artificial neural networks, and introduces the acronym EPANN. EPANNs are evolved because parts of their design are determined by an evolutionary algorithm; they are plastic because parts of their structures or functions, e.g. the connectivity among neurons, change at various time scales while experiencing sensory-motor information streams. The final capabilities of such networks are autonomously determined by the combination of evolved genetic instructions and learning that takes place as the network interacts with an environment.

EPANNs’ ambitious motivations and aims, centered on the autonomous discovery and design of learning systems, also entail a number of research problems. One problem is how to set up evolutionary experiments that can discover learning, and then to understand the subsequent interaction of dynamics across the evolutionary and learning dimensions. A second open question concerns the appropriate neural model abstractions that may capture essential computational principles to enable learning and, more generally, intelligence. One further problem is the size of very large search spaces, and the high computational cost required to simulate even simple models of lifelong learning and evolution. Finally, experiments to autonomously discover intelligent learning systems have a wide range of performance metrics, as their objectives are sometimes loosely defined as the increase of behavioral complexity, intelligence, adaptability, evolvability (Miconi, 2008), and general learning capabilities (Tonelli and Mouret, 2011). Thus, EPANNs explore a larger search space, and address broader research questions, than machine learning algorithms specifically designed to improve performance on well-defined and narrow problems.

The power of EPANNs, however, derives from two autonomous search processes: evolution and learning, which arguably place them among the most advanced AI and machine learning systems in terms of open-endedness, autonomy, potential for discovery, creativity, and human-free design. These systems rely the least on pre-programmed instructions because they are designed to autonomously evolve while interacting with a real or simulated world. Plastic networks, in particular recurrent plastic networks, are known for their computational power (Cabessa and Siegelmann, 2014): evolution can be a valuable tool to explore the power of those computational structures.

In recent years, progress in a number of relevant areas has set the stage for renewed advancements of EPANNs: ANNs, in particular deep networks, are becoming increasingly more successful and popular; there has been a remarkable increase in available computational power by means of parallel GPU computing and dedicated hardware; a better understanding of search, complexity, and evolutionary computation allows for less naive approaches; and finally, neuroscience and genetics provide us with an increasingly large set of inspirational principles. This progress has changed the theoretical and technological landscape in which EPANNs first emerged, providing greater research opportunities than in the past.

Despite a considerable body of work, research in EPANNs has never been unified through a single description of its motivations and inspiration, achievements and ambitions. This paper aims firstly to outline the inspirational principles that motivate EPANNs (Section II). The main properties and aims of EPANNs, and suitable evolutionary algorithms are presented in Section III. The body of research that advanced EPANNs is brought together and described in Section IV. Finally, the paper outlines new research directions, opportunities, and challenges for EPANNs (Section V).

II Inspiration

EPANNs are inspired by a particularly large variety of ideas from biology, computer science, and other areas (Floreano and Mattiussi, 2008; Downing, 2015). It is also the nature of inspiration to be subjective, and some of the topics described in this section will resonate differently to different readers. We will touch upon large themes and research areas with the intent to provide the background and motivations to introduce the properties, the progress, and the future directions of EPANNs in the remainder of the paper.

The precise genetic make-up of an organism, acquired through millions of years of evolution, is now known to determine the ultimate capabilities of complex biological neural systems (Deary et al., 2009; Hopkins et al., 2014): different animal species manifest different levels of skills and intelligence because of their different genetic blueprint (Schreiweis et al., 2014). The intricate structure of the brain emerges from one single zygote cell through a developmental process (Kolb and Gibb, 2011), which is also strongly affected by input-output learning experiences throughout early life (Hensch et al., 1998; Kolb and Gibb, 2011). Yet high levels of plasticity are maintained throughout the entire lifespan (Merzenich et al., 1984; Kiyota, 2017). These dimensions, evolution, development and learning, also known as the phylogenetic (evolution), ontogenetic (development) and epigenetic (learning) (POE) dimensions (Sipper et al., 1997), are essential for the emergence of biological plastic brains.

The POE dimensions lead to a number of research questions. Can artificial intelligence systems be entirely engineered by humans, or do they need to undergo a less human-controlled process such as evolution? Do intelligent systems need to learn, or could they be born already knowing? Is there an optimal balance between innate and acquired knowledge? Opinions and approaches are diverse. Additionally, artificial systems do not need to implement the same constraints and limitations as biological systems (Bullinaria, 2003). Thus, inspiration is not simple imitation.

EPANNs assume that both evolution and learning, if not strictly necessary, are conducive to the emergence of a strongly bio-inspired artificial intelligence. While artificial evolution is justified by the remarkable achievements of natural evolution, the role of learning has gathered significance in recent years. We are now more aware of the high level of brain plasticity, and its impact on the manifestation of behaviors and skills (LeDoux, 2003; Doidge, 2007; Grossberg, 2012). Concurrently, recent developments in machine learning (Michalski et al., 2013; Alpaydin, 2014) and neural learning (Deng et al., 2013; LeCun et al., 2015; Silver et al., 2016), have highlighted the importance of learning from large input-output data and extensive training. Other areas of cognition such as the capabilities to make predictions (Hawkins and Blakeslee, 2007), to establish associations (Rescorla, 2014) and to regulate behaviors (Carver and Scheier, 2012) are also based on learning from experience. Interestingly, skills such as reading, playing a musical instrument, or driving a car, are mastered even if none of those behaviors existed during evolutionary time, and yet they are mostly unique to humans. Thus, human genetic instructions have evolved not to learn specific tasks, but to synthesize recipes to learn a large variety of general skills. We can conclude that the evolutionary search of learning mechanisms in EPANNs tackles both the long-running nature vs. nurture debate (Moore, 2003), and the fundamental AI research that studies learning algorithms. This review focuses on evolution and learning, and less on development, which can be interpreted as a form of learning if affected by sensory-motor signals. We refer to Stanley and Miikkulainen (2003) for an overview of artificial developmental theories.

Whilst the range of inspiring ideas is large and heterogeneous, the analysis in this review proposes that such ideas can be grouped under the following areas:

•

natural and artificial evolutionary processes,

•

plasticity in biological neural networks,

•

plasticity in artificial neural networks, and

•

natural and artificial learning environments.

Figure 1 graphically summarizes the topics described in sections II-A-II-D from which EPANNs take inspiration.

II-A Natural and artificial evolutionary processes

A central idea in evolutionary computation (Goldberg and Holland, 1988) is that evolutionary processes, similar to those that occurred in nature during the course of billions of years (Darwin, 1859; Dobzhansky, 1970), can be simulated with computer software. This idea led to the belief that intelligent computer programs could emerge with little human intervention by means of evolution in-silico (Holland and Reitman, 1977; Koza, 1992; Fogel, 2006).

The emergence of evolved intelligent software, however, did not occur as easily as initially hoped. The reasons for the slow progress are not completely understood, but a number of problems have been identified, likely related to the simplicity of the early implementations of evolutionary algorithms and the high computational requirements. Current topics of investigation focus on levels of abstraction, diversity in the population, selection criteria, the concepts of evolvability and scalability (Wagner and Altenberg, 1996; Pigliucci, 2008; Lehman and Stanley, 2013), the encoding of genetic information through the genotype-phenotype mapping processes (Wagner and Altenberg, 1996; Hornby et al., 2002), the deception of fitness objectives, and how to avoid them (Lehman and Stanley, 2008; Stanley and Lehman, 2015). It is also not clear yet which stepping stones were most challenging for natural evolution (Roff, 1993; Stanley and Lehman, 2015) in the evolutionary path to intelligent and complex forms of life. This lack of knowledge highlights that our understanding of natural evolutionary processes is incomplete, and thus the potential to exploit computational methods is not fully realized. In particular, EPANN research is concerned with those evolutionary algorithms that allow the most creative, open-ended and scalable design. Effective evolutionary algorithms and their desirable features for EPANNs are detailed later in Section III-C.

II-B Plasticity in biological neural networks

Biological neural networks demonstrate lifelong learning, from simple reflex adaptation to the acquisition of astonishing skills such as social behavior and learning to speak one or more languages. Those skills are acquired through experiencing stimuli and actions and by means of learning mechanisms not yet fully understood (Bear et al., 2007). The brief overview here outlines that adaptation and learning strongly rely on neural plasticity, understood as “the ability of neurons to change in form and function in response to alterations in their environment” (Kaas, 2001).

The fact that experiences guide lifelong learning was extensively documented in the works of behaviorism by scientists such as Thorndike (1911), Pavlov (1927), Skinner (1938, 1953), and Hull (1943) who started to test scientifically how experiences cause a change in behavior, in particular as a result of learning associations and observable behavioral patterns (Staddon, 1983). This approach means linking behavior to brain mechanisms and dynamics, an idea initially entertained by Freud (Køppe, 1983) and later by other illustrious scientists (Hebb, 1949; Kandel, 2007). A seminal contribution to link psychology to physiology came from Hebb (1949), whose principle that neurons that fire together, wire together is relevant to understanding both low level neural wiring and high level behaviors (Doidge, 2007). Much later, a Hebbian-compatible rule that regulates synaptic changes according to the firing times of the presynaptic and postsynaptic neurons was observed by Markram et al. (1997) and named Spike-Timing-Dependent Plasticity (STDP).

The seminal work of Kandel and Tauc (1965), and following studies (Clark and Kandel, 1984), were the first to demonstrate that changes in the strength of connectivity among neurons, i.e. plasticity, relates to behavior learning. Walters and Byrne (1983) showed that, by means of plasticity, a single neuron can perform associative learning such as classical conditioning, a class of learning that is observed in simple neural systems such as that of the Aplysia (Carew et al., 1981). Plasticity driven by local neural stimuli, i.e. compatible with the Hebb synapse (Hebb, 1949; Brown et al., 1990), is responsible not only for fine tuning, but also for building a working visual system in the cat’s visual cortex (Rauschecker and Singer, 1981).

Biological plastic neural networks are also capable of structural plasticity, which creates new pathways among neurons (Lamprecht and LeDoux, 2004; Chklovskii et al., 2004; Russo et al., 2010): it occurs primarily during development, but there is evidence that it continues well into adulthood (Pascual-Leone et al., 2005). Axon growth, known to be regulated by neurotrophic nerve growth factors (Tessier-Lavigne and Goodman, 1996), was also modeled computationally in Roberts et al. (2014). Developmental processes and neural plasticity are often indistinguishable (Kolb, 1989; Pascual-Leone et al., 2005) because the brain is highly plastic during development. Neuroscientific advances reviewed in Damasio (1999); LeDoux (2003); Pascual-Leone et al. (2005); Doidge (2007); Draganski and May (2008) outline the importance of structural plasticity in learning motor patterns, associations, and ways of thinking. Both structural and functional plasticity in biology are essential to acquiring long-lasting new skills, and for this reason appears to be an important inspiration for EPANNs.

Finally, an important mechanism for plasticity and behavior is neuromodulation (Marder and Thirumalai, 2002; Gu, 2002; Bailey et al., 2000). Modulatory chemicals such as acetylcholine (ACh), norepinephrine (NE), serotonin (5-HT) and dopamine (DA) appear to regulate a large variety of neural functions, from arousal and behavior (Harris-Warrick and Marder, 1991; Hasselmo and Schnell, 1994; Marder, 1996; Katz, 1995; Katz and Frost, 1996), to pattern generation (Katz et al., 1994), to memory consolidation (Kupfermann, 1987; Hasselmo, 1995; Marder, 1996; Hasselmo, 1999). Learning by reward in monkeys was linked to dopaminergic activity during the 1990s with studies by Schultz et al. (1993, 1997); Schultz (1998). For these reasons, neuromodulation is considered an essential element in cognitive and behavioral processes, and has been the topic of a considerable amount of work in EPANNs (Section IV-E).

This compact overview suggests that neural plasticity encompasses an important set of mechanisms, regulated by a rich set of signals and dynamics currently mostly ignored in ANNs. Thus, EPANNs can be used to explore, via evolutionary search, the potential of plasticity and to answer questions such as: (1) How does a brain-like structure form—driven both by genetic instructions and neural activity—and acquire functions and behaviors? (2) What are the key plasticity mechanisms from biology that can be applied to artificial systems such as EPANNs? (3) Can memories, skills, and behaviors be stored in plastic synaptic connections, in patterns of activities, or in a combination of both? Whilst neuroscience continues to provide inspiration and insight into plasticity in biological brains, EPANNs serve the complementary objective of seeking, implementing, and verifying designs of bio-inspired methods for adaptation, learning, and intelligent behavior.

II-C Plasticity in artificial neural networks

In EPANN experiments, evolution can be seen as a meta-learning process. Thus, established learning rules for ANNs are often used as ingredients that evolution uses to search for good parameter configurations, efficient combinations of rules and network topologies, new functions representing novel learning rules, etc. EPANN experiments are suited to include the largest possible variety of rules because of (1) the variety of possible tasks in a simulated behavioral experiment and (2) the flexibility of evolution to combine rules with no assumptions about their dynamics. The following gives a snapshot of the extent and scope of various learning algorithms for ANN that can be used as building blocks of EPANNs.

In supervised learning, backpropagation is the most popular learning rule used to train both shallow and deep networks (Rumelhart et al., 1988; Widrow and Lehr, 1990; LeCun et al., 2015) for classification or regression. Unsupervised learning is implemented in neural networks with self-organizing maps (SOM) (Kohonen, 1982, 1990), auto-encoders (Bourlard and Kamp, 1988), restricted Boltzmann machines (RBM) (Hinton and Salakhutdinov, 2006), Hebbian plasticity (Hebb, 1949; Gerstner and Kistler, 2002a; Cooper, 2005), generative adversarial networks (Goodfellow et al., 2014), and various combinations of the above. RBM learning is considered related to the free-energy principle, proposed by Friston (2009) as a central principle governing learning in the brain. Hebbian rules, in particular, given their biological plausibility and unsupervised learning, are a particularly important inspirational principle for EPANNs. Variations (Willshaw and Dayan, 1990) have been proposed to include, e.g., terms to achieve stability (Oja, 1982; Bienenstock et al., 1982) and various constraints (Miller and Mackay, 1994), or more advanced update dynamics such as dual weights for fast and slow decay (Levy and Bairaktaris, 1995; Hinton and Plaut, 1987; Bullinaria, 2009a; Soltoggio, 2015). Hebbian rules have been recently proposed to minimize defined cost functions (Pehlevan et al., 2015; Bahroun et al., 2017), and more advanced systems have used backpropagation as meta-learning to tune Hebbian rules (Miconi et al., 2018).

Neuromodulated plasticity (Fellous and Linster, 1998) is often used to implement reward-learning in neural networks. Such a modulation of signals, or gated learning (Abbott, 1990), allows for amplification or reduction of signals and has been implemented in numerous models (Baxter et al., 1999; Suri et al., 2001; Birmingham, 2001; Alexander and Sporns, 2002; Doya, 2002; Fujii et al., 2002; Suri, 2002; Ziemke and Thieme, 2002; Sporns and Alexander, 2003; Krichmar, 2008).

Plastic neural models are also used to demonstrate how behavior can emerge from a particular circuitry modeled after biological brains. Computational models of, e.g., the basal ganglia and modulatory systems may propose plasticity mechanisms and aim to demonstrate the computational relations among various nuclei, pathways, and learning processes (Krichmar, 2008; Vitay and Hamker, 2010; Schroll and Hamker, 2015).

Finally, plasticity rules for spiking neural networks (Maass and Bishop, 2001) aim to demonstrate unique learning mechanisms that emerge from spiking dynamics (Markram et al., 1997; Izhikevich, 2006, 2007), as well as model biological synaptic plasticity (Gerstner and Kistler, 2002b).

Plasticity in neural networks, when continuously active, was also observed to cause catastrophic forgetting (Robins, 1995). If learning occurs continuously, new information or skills have the potential to overwrite previously acquired information or skills, a problem also known as plasticity-stability dilemma (Abraham and Robins, 2005; Finnie and Nader, 2012).

In conclusion, a large range of plasticity rules for neural networks have been proposed to solve different problems. In the majority of cases, a careful matching and engineering of rules, architectures and problems is necessary, requiring considerable design effort. The variety of algorithms also reflects the variety of problems and solutions. One key aspect is that EPANN systems can effectively play with all possible plasticity rules to offer a unique testing tool and assess the effectiveness and suitability of different models, or their combination, in a variety of different scenarios.

II-D Lifelong learning environments

One aspect of EPANNs is that they can continuously improve and adapt both at the evolutionary scale and at the lifetime scale in a virtually unlimited range of problems. Natural environments are an inspiration for EPANNs because organisms have evolved to adapt to, and learn in, a variety of conditions. Fundamental questions are: what makes an environment conducive to the evolution of learning and intelligence? What are the challenges faced by learning organisms in the natural world, and how does biological learning cope with those? How can those challenges be abstracted and ported to a simulated environment for EPANNs? EPANNs employ lifelong learning environments in the attempt to provide answers to such questions.

In the early phases of AI, logic and reasoning were thought to be the essence of intelligence (Cervier, 1993), so symbolic input-output mappings were employed as tests. Soon it became evident that intelligence is not only symbol manipulation, but resides also in subsymbolic problem solving abilities emerging from the interaction of brain, body, and environment (Steels, 1993; Sims, 1994). More complex simulators of real-life environments and body-environment interaction were developed to better represent the enactivist philosophy (Varela et al., 2017) and cognitive theories on the emergence of cognition (Butz and Kutter, 2016). Other environments focus on high-level planning and strategies required, e.g., when applying AI to games (Allis et al., 1994; Millington and Funge, 2016) or articulated robotic tasks. Planning and decision making with high bandwidth sensory-motor information flow such as those required for humanoid robots or self-driving vehicles are current benchmarks for lifelong learning systems. Finally, environments in which affective dynamics and feelings play a role are recognized as important for human well being (De Botton, 2016; Lee and Narayanan, 2005). Those intelligence-testing environments are effectively the “worlds” in which EPANNs may evolve and live in embodied forms, and thus largely shape the EPANN design process.

Such different testing environments have very different features, dynamics, and goals that fall into different machine learning problems. For example, supervised learning can be mapped to a fitness function when a precise target behavior exists and is known. If it is useful to find relationships and regularities in the environment, unsupervised learning, representation learning, or modularity can be evolved (Bullinaria, 2007b). If the environment provides rewards, the objective may be to search for behavioral policies that lead to collecting rewards: algorithms specifically designed to do so are called reinforcement learning (Sutton and Barto, 1998). While reinforcement learning maximizes a reward or fitness, recent advances in evolutionary computation (Lehman and Stanley, 2011; Stanley and Lehman, 2015) suggest that it is not always the fittest, but at times it is the novel individual or behavior that can exploit environmental niches, thus leading to creative evolutionary processes similar to those observed in nature. Temporal dynamics, i.e. when a system requires to behave over time according to complex dynamics, need different computational structures from functions with no temporal dynamics. This case is typical for EPANN experiments that may exhibit a large variety of time scales in complex behavioral tasks. With traditional approaches, all those different cases require careful manual design to solve each problem. In contrast, the evolution in EPANNs can be designed to address most problems by mapping a measure of success to a fitness value, thus searching for solutions in an increasingly large variety of problems and environments. In conclusion, lifelong learning environments of different types can be used with EPANNs to explore innovative and creative solutions with limited human intervention and design.

III Properties, aims, and evolutionary algorithms for EPANNs

Having introduced the inspirational principles of EPANNs, we now propose: a list of primary properties that define EPANNs (Section III-A); the principal aims of EPANN studies (Section III-B); and a list of desired properties of EAs for EPANNs (Section III-C).

III-A EPANN properties

EPANNs, as formalized in this review, are defined as artificial neural networks with the following properties:

Property 1 - Evolution: Parts of an EPANN are determined by an evolutionary algorithm. Inspired by natural and artificial evolution (Section II-A), such search dynamics in EPANNs implement a design process.

Property 2 - Plasticity: Parts of the functions that process signals within the network change in response to signals propagated through the network, and those signals are at least partially affected by stimuli. Inspired by biological findings on neural plasticity (Section II-B) and empowered by the effectiveness of plasticity in neural models (Section II-C), EPANNs either include such mechanisms or are set up with the conditions to evolve them.

Property 3 - Discovery of learning: Property 1 and 2 are implemented to discover, through evolution, learning dynamics within an artificial neural network. Thus, an EPANN uses both evolution and plasticity in synergy to achieve learning. Such a property can be present in different degrees, from the highest degree in which no learning occurs before evolution and it is therefore discovered from scratch, to the lowest degree in which learning is fine-tuned and optimized, e.g, when evolution is seeded with proven learning structures. Given the very diverse interpretations of learning in different domains, we refer to Michalski et al. (2013) for an overview, or otherwise assume the general machine learning definition by Michalski et al. (2013)111A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E..

Property 4 - Generality: Properties 1 to 3 are independent from the learning problem(s) and from the plasticity mechanism(s) that are implemented or evolved in an EPANN. Exploiting the flexibility of evolution and learning, (1) EPANNs can evolve to solve problems of different nature, complexity, and time scales (Section II-D); (2) EPANNs are not limited to specific learning dynamics because often it is the aim of the experiment to discover the learning mechanism throughout evolution and interaction with the environment.

In summary, the EPANNs’ properties indicate that within simple assumptions, i.e., using plasticity and evolution, EPANNs are set to investigate the design of learning in creative ways for a large variety of learning problems.

III-B Aims

Given the above properties, EPANN experiments can be set up to achieve the following aims.

Aim 1: Autonomously design learning systems: in an EPANN experiment, it is essential to delegate some design choices of a learning system to the evolutionary process, so that the design is not entirely determined by the human expert and can be automated. The following sub-aims can then be identified.

Aim 1.1: Bootstrap of learning from scratch: in an EPANN experiment, it may be desirable to initialize the system with no learning capabilities before evolution takes place, so that the best learning dynamics for a given environment is evolved rather than human-designed.

Aim 1.2: Optimize performance: as opposed to Aim 1.1, it may be desirable to initialize the system with well know learning capabilities, so that evolution can autonomously optimize the system, e.g., for final performance after learning.

Aim 1.3: Recover performance in unseen conditions: in an EPANN experiment, the desired outcome may be to enable the learning system to autonomously evolve from solving a set of problems to another set without human intervention.

Aim 2: Test the computational advantages of particular neural components: the aim of an EPANN experiment might be to test whether particular neural dynamics or components have an evolutionary advantage when implementing particular learning functions. The presence of particular neural component may be fostered by evolutionary selection.

Aim 3: Derive hypotheses on the emergence of biological learning: an aim may be to draw similarities or suggest hypotheses on how learning evolved in biological systems, particularly in combination with Aim 1.1 (bootstrap of learning).

Aim 1 is always present in any EPANN because it derives from the EPANN properties. The other aims may be present in different EPANN studies and can be expanded into more detailed and specific research hypotheses.

III-C Evolutionary algorithms for EPANNs

In contrast to parameter optimization (Bäck and Schwefel, 1993) in which search spaces are often of fixed dimension and static, EPANNs evolve in dynamic search spaces in which learning further increases the complexity of the evolutionary search and of the problem itself. Evolutionary algorithms (Holland, 1975; Michalewicz, 1994) for EPANNs often require additional advanced features to cope with the challenges of the evolution of learning and open evolutionary design (Bentley, 1999). The analysis in this review suggests that evolutionary algorithms (EAs) for EPANNs may implement the following desirable properties.

III-C1 Variable genotype length and growing complexity

for some learning problems, the size and properties of a network that can solve them are not known in advance. Therefore, a desirable property of the EA for EPANNs is that of increasing the length of the genotype, and thus the information contained in it, as evolution may discover increasingly more complex strategies and solutions that may require larger networks (see, e.g., Stanley and Miikkulainen (2002)).

III-C2 Indirect genotype to phenotype encoding

in nature, phenotypes are expressions of a more compact representation: the genetic code. Similarly, EAs may represent genetic information in a compact form, which is then mapped to a larger phenotype. Although EPANNs do not require such a property, such an approach promises better scalability to large networks (see, e.g., Risi and Stanley (2010)).

III-C3 Expressing regularities, repetitions, and patterns

indirect encodings are beneficial when they can use one set of instructions in the genotype to generate more parts in the phenotype. This may involve expressing regularities like symmetry (e.g. symmetrical neural architectures), repetition (e.g. neural modules), repetition with variation (similar neural modules), and patterns, e.g., motifs in the neural architecture (see, e.g., (Stanley, 2007)).

III-C4 Effective exploration via mutation and recombination

genetic mutation and sexual reproduction in nature allow for the expression of a variety of phenotypes and for the exploration of new solutions, but seldom lead to highly unfit individuals (Ay et al., 2007). Similarly, EAs for EPANNs need to be able to effectively mutate and recombine genomes without destroying the essential properties of the solutions. EAs may use recombination to generate new solutions from two parents: how to effectively recombine genetic information from two EPANNs is still an open question (see, e.g., tracking genes through historical marking in NEAT (Stanley and Miikkulainen, 2002)).

III-C5 Genetic encoding of plasticity rules

just as neural networks need a genetic encoding to be evolved, so do plasticity rules. EPANN algorithms require the integration of such a rule in the genome. The encoding may be restricted to a simple parameter search, or evolution may search a larger space of arbitrary and general plasticity rules. Plasticity may also be applied to all or parts of the network, thus effectively implementing the evolution of learning architectures.

III-C6 Diversity, survival criteria, and low selection pressure

the variety of solutions in nature seems to suggest that diversity is a key aspect of natural evolution. EAs for EPANNs are likely to perform better when they can maintain diversity in the population, both at the genotype and phenotype levels. Local selection mechanisms were shown to perform well in EPANN experiments (Soltoggio, 2008c). Niche exploration and behavioral diversity (Lehman and Stanley, 2011) could also play a key role for creative design processes. Low selection pressure and survival criteria might be crucial to evolve learning in deceptive environments (see Section IV-D).

III-C7 Open-ended evolution

evolutionary optimization aims to quickly optimize parameters and reach a fitness plateau. On the contrary, EAs for EPANNs often seek a more open-ended evolution that can evolve indefinitely more complex solutions given sufficient computational power (Taylor et al., 2016; Stanley et al., 2017).

III-C8 Implementations

many EAs include one or more of these desirable properties (Fogel, 2006). Due to the complexity of neural network design, the field of neuroevolution was the first to explore most of those extensions of standard evolutionary algorithms. Popular algorithms include early work of Angeline et al. (1994) and Yao and Liu (1997) to evolve fixed weights (i.e. weights that do not change while the agent interacts with its environment) and the topology of arbitrary neural networks, e.g., recurrent (addressing III-C1 and III-C4). Neuroevolution of Augmenting Topologies (NEAT) (Stanley and Miikkulainen, 2002) leverages three main aspects: a recombination operator intended to preserve network function (addressing III-C4); speciation (addressing III-C6); and evolution from small to larger networks (addressing III-C1). Similarly, EPANN-tailored EAs in Soltoggio (2008c) employ local selection mechanisms to maintain diversity. Analog Genetic Encoding (AGE) (Mattiussi and Floreano, 2007) is a method for indirect genotype to phenotype mapping that can be used in combination with evolution to design arbitrary network topologies (addressing III-C1 and III-C2) and was used with EPANNs. HyperNEAT (Stanley et al., 2009) is an indirect representation method that combines the NEAT algorithm with compositional patterns producing networks (CPPN) (Stanley, 2007) (III-C1, III-C2 and III-C3). Novelty search (Lehman and Stanley, 2008, 2011) was introduced as an alternative to the survival of the fittest as a selection mechanism (III-C6). Initially, the majority of these neuroevolution algorithms were not devised to evolve plastic networks, but rather fixed networks in which the final synaptic weights were encoded in the genome. To operate with EPANNs, these algorithms need to integrate additional genotypical instructions to evolve plasticity rules (III-C5).

By adding these EA features (III-C1 - III-C7) to standard evolutionary algorithms, EPANNs aim to search extremely large search spaces in fundamentally different and more creative ways than traditional heuristic searches of parameters and hyper-parameters.

The process of evolving plastic neural networks is depicted in Fig. 2.

IV Progress on evolving artificial plastic neural networks

This section reviews studies that have evolved plastic neural networks (EPANNs). The survey is divided into six sections that mirror the analysis of the field up to this point: the evolution of plasticity rules, the evolution of neural architectures, EPANNs in evolutionary robotics, the evolutionary discovery of learning, the evolution of neuromodulation, and the evolution of indirectly encoded plasticity. Accordingly, Figure 3 provides our perspective on the organization of the field, reflected in the structure of this paper.

IV-A Evolving plasticity rules

Early EPANN experiments evolved the parameters of learning rules for fixed or hand-designed ANN architectures. Learning rules are functions that change the connection weight $w$ between two neurons, and are generally expressed as

[TABLE]

where $\mathbf{x}$ is a vector of neural signals and $\mathbf{\theta}$ is a vector of fixed parameters that can be searched by evolution. The incoming connection weights $\mathbf{w}$ to a neuron $i$ are used to determine the activation value

[TABLE]

where $x_{j}$ are the activation values of presynaptic neurons that connect to neuron $i$ with the weights $w_{ji}$ , and $\sigma$ is a nonlinear function such as the sigmoid or the hyperbolic tangent. The vector $\mathbf{x}$ may provide local signals such as pre and postsynaptic activities, the value of the weight $w$ , and modulatory or error signals.

Bengio et al. (1990, 1992) proposed the optimization of the parameters $\theta$ of generic learning rules with gradient descent, simulated annealing, and evolutionary search for problems such as conditioning, boolean function mapping, and classification. Those studies are also among the first to include a modulatory term in the learning rules. The optimization was shown to improve the performance in those different tasks with respect to manual parameter settings. Chalmers (1990) evolved a learning rule that applied to every connection and had a teaching signal. He found that, in 20% of the evolutionary runs, the algorithm rediscovered, through evolution, the well-known delta rule, or Widrow-Hoff rule (Widrow et al., 1960), used in backpropagation, thereby demonstrating the validity of evolution as an autonomous tool to discover learning. Fontanari and Meir (1991) used the same approach of Chalmers (1990) but constrained the weights to binary values. Also in this case, evolution autonomously rediscovered a hand-designed rule, the directed drift rule by Venkatesh (1993). They also observed that the performance on new tasks was better when the network evolved on a larger set of tasks, possibly encouraging the evolution of more general learning strategies.

With backpropagation of errors (Widrow and Lehr, 1990), the input vector $\mathbf{x}$ of Eq. 1 requires an error signal between each input/output pair. In contrast, rules that use only local signals have been a more popular choice for EPANNs, though this is changing with the rise in effectiveness of deep learning using back-propagation and related methods. In the simplest form, the product of presynaptic ( $x_{j}$ ) and postsynaptic ( $x_{i}$ ) activities, and a learning rate $\eta$

[TABLE]

is known as Hebbian plasticity (Hebb, 1949; Cooper, 2005). More generally, any function as in Eq. 1 that uses only local signals is considered a local plasticity rule for unsupervised learning. Baxter (1992) evolved a network that applied the basic Hebbian rule in Eq. 3 to a subset of weights (determined by evolution) to learn four functions of one variable. The network, called Local Binary Neural Net (LBNN), evolved to change its weights to one of two possible values ( $\pm 1$ ), or have fixed weights. The experiment proved that learning can evolve when rules are optimized and applied to individual connections.

Nolfi and Parisi (1993) evolved networks with “auto-teaching” inputs, which could then provide an error signal for the network to adjust weights during lifetime. The implication is that error signals do not always need to be hand-designed but can be discovered by evolution to fit a particular problem. A set of eight different local rules was used in Rolls and Stringer (2000) to investigate the evolution of rules in combination with the number of synaptic connections for each neuron, different neuron classes, and other network parameters. They found that evolution was effective in selecting specific rules from a large set to solve simple linear problems. In Maniadakis and Trahanias (2006), co-evolution was used to evolve agents (each being a network) that could use ten different types of Hebbian-like learning rules for simple navigation tasks: the authors reported that, despite the increase in the search space, using many different learning rules results in better performance but, understandably, a more difficult analysis of the evolved systems. Meng et al. (2011) evolved a gene regulatory network that in turn determined the learning parameters of the Bienenstock-Cooper-Munro (BCM) rule (Bienenstock et al., 1982), showing promising performance in time series classification and other supervised tasks.

One general finding from these studies is that evolution operates well within large search spaces, particularly when a large set of evolvable rules is used.

IV-B Evolving learning architectures

The interdependency of learning rules and neural architectures led to experiments in which evolution had more freedom on the network’s design. The evolution of architectures in ANNs may involve searching an optimal number of hidden neurons, the number of layers in a network, particular topologies or modules, the type of connectivity, and other properties of the network’s architecture. In EPANNs, evolving learning architectures implies more specifically to discover a combination of architectures and learning rules whose synergetic matching enables particular learning dynamics. As opposed to biological networks, EPANNs do not have the neurophysiological constraints, e.g., short neural connections, sparsity, brain size limits, etc., that impose limitations on the natural evolution of biological networks. Thus, biologically implausible artificial systems may nevertheless be evolved in computer simulations (Bullinaria, 2007b, 2009b).

One seminal early study by Happel and Murre (1994) proposed the evolutionary design of modular neural networks, called CALM (Murre, 1992), in which modules could perform unsupervised learning, and the intermodule connectivity was shaped by Hebbian rules. The network learned categorization problems (simple patterns and hand written digits recognition), and showed that the use of evolution led to enhanced learning and better generalization capabilities in comparison to hand-designed networks. In Arifovic and Gencay (2001), the authors used evolution to optimize the number of inputs and hidden nodes, and allowed connections in a feedforward neural network to be trained with backpropagation. Abraham (2004) proposed a method called Meta-Learning Evolutionary Artificial Neural Networks (MLEANN) in which evolution searches for initial weights, neural architectures and transfer functions for a range of supervised learning problems to be solved by evolved networks. The evolved networks were tested in time series prediction and compared with manually designed networks. The analysis showed that evolution consistently found networks with better performance than the hand-designed structures. Khan et al. (2008) proposed an evolutionary developmental system that created an architecture that adapted with learning: the network had a dynamic morphology in which neurons could be inserted or deleted, and synaptic connections formed and changed in response to stimuli. The networks were evolved with Cartesian genetic programming and appeared to improve their performance while playing checkers over the generations. Downing (2007) looked at different computational models of neurogenesis to evolve learning architectures. The proposed evolutionary developmental system focused in particular on abstraction levels and principles such as Neural Darwinism (Edelman and Tononi, 2000). A combination of evolution of recurrent networks with a linear learner in the output was proposed in Schmidhuber et al. (2007), showing that the evolved RNNs were more compact and resulted in better learning than randomly initialized echo state networks (Jaeger and Haas, 2004). In Khan et al. (2011b, a); Khan and Miller (2014), the authors introduced a large number of bio-inspired mechanisms to evolve networks with rich learning dynamics. The idea was to use evolution to design a network that was capable of advanced plasticity such as dendrite branch and axon growth and shrinkage, neuron insertion and destruction, and many others. The system was tested on the Wumpus World (Russell and Norvig, 2013), a fairly simple problem with no learning required, but the purpose was to show that evolution can design working control networks even within a large search space.

In summary, learning mechanisms and neural architectures are strongly interdependent, but a large set of available dynamics seem to facilitate the evolution of learning. Thus, EPANNs become more effective precisely when manual network design becomes less practical because of complexity and rich dynamics.

IV-C EPANNs in Evolutionary Robotics

Evolutionary robotics (ER) (Cliff et al., 1993; Floreano and Mondada, 1994, 1996; Urzelai and Floreano, 2000; Floreano and Nolfi, 2004) contributed strongly to the development of EPANNs, providing a testbed for applied controllers in robotics. Although ER had no specific assumptions on neural systems or plasticity (Smith, 2002), robotics experiments suggested that neural control structures evolved with fixed weights perform less well than those evolved with plastic weights (Nolfi and Parisi, 1996; Floreano and Urzelai, 2001b). In a conditional phototaxis robotic experiment222The fitness value was the time spent by a two-wheeled robot in one particular area of the area when a light was on, divided by the total experiment time., Floreano and Urzelai (2001a) reported that networks evolved faster when synaptic plasticity and neural architectures were evolved simultaneously. In particular, plastic networks were shown to adapt better in the transition from simulation to real robots. The better simulation-to-hardware transition, and the increased adaptability in changing ER environments, appeared intuitive and supported by evidence (Nolfi and Parisi, 1996). However, the precise nature and magnitude of the changes from simulation to hardware is not always easy to quantify: those studies do not clearly outline the precise principles, e.g., better or adaptive feedback control, minimization principles, etc., that are discovered by evolution with plasticity to produce those advantages. In fact, the behavioral changes required to switch behaviors in simple ER experiments can also take place with non-plastic recurrent neural networks because evolution can discover recurrent units that act as switches. A study in 2003 observed similar performance in an associative learning task (food foraging) when comparing plastic and non-plastic recurrent networks (Stanley et al., 2003). Recurrent networks with leaky integrators as neurons (Beer and Gallagher, 1992; Funahashi and Nakamura, 1993; Yamauchi and Beer, 1994) were also observed to achieve similar performance to plastic networks (Blynel and Floreano, 2002, 2003). These early studies indicate that the evolution of learning with plastic networks was at that point still a proof-of-concept rather than a superior learning tool: aided by evolutionary search, networks with recurrent connections and fixed weights could create recurrent nodes, retain information and achieve similar learning performance to networks with plastic weights.

Nevertheless, ER maintained a focus on plasticity as demonstrated, e.g., in The Cyber Rodent Project (Doya and Uchibe, 2005) that investigated the evolution of learning by seeking to implement a number of features such as (1) evolution of neural controllers, (2) learning of foraging and mating behaviors, (3) evolution of learning architectures and meta-parameters, (4) simultaneous learning of multiple agents in a body, and (5) learning and evolution in a self-sustained colony. Plasticity in the form of modulated neural activation was used in Husbands et al. (1998) and Smith et al. (2002) with a network that adapts its activation functions according to the diffusion of a simulated gas spreading to the substrate of the network. Although the robotic visual discrimination tasks did not involve learning, the plastic networks appeared to evolve faster than a network evolved with fixed activation functions. Similar conclusions were reached in Di Paolo (2003) and Federici (2005). Di Paolo (2002, 2003) evolved networks with STDP for a wheeled robot to perform positive and negative phototaxis, depending on a conditioned stimulus, and observed that networks with fixed weights could learn but had inferior performance with respect to plastic networks. Federici (2005) evolved plastic networks with STDP and an indirect encoding, showing that plasticity helped performance even if learning was not required. Stability and evolvability of simple robotic controllers were investigated in Hoinville et al. (2011) who focused on EPANNs with homeostatic mechanisms.

Experiments in ER in the 1990s and early 2000s revealed the extent, complexity, and multitude of ideas behind the evolutionary design of learning neuro-robotics controllers. They generally indicate that plasticity helps evolution under a variety of conditions, even when learning is not required, thereby promoting further interest in more specific topics. Among those are the evolutionary discovery of learning, the evolution of neuromodulation, and the evolution of indirectly encoded plasticity, as described in the following.

IV-D Evolutionary discovery of learning

When evolution is used to search for learning mechanisms, two main cases can be distinguished: (1) when learning is used to acquire constant facts about the agent or environment, and (2) when learning is used to acquire changeable facts. The first case, that of static or stationary environments, is known to be affected by the Baldwin effect (Baldwin, 1896) that suggests an acceleration of evolution when learning occurs during lifetime. A number of studies showed that the Baldwin effect can be observed with computational simulations (Smith, 1986; Hinton and Nowlan, 1987; Boers et al., 1995; Mayley, 1996; Bullinaria, 2001). With static environments, learning causes a faster transfer of knowledge into the genotype, which can happen when facts are stationary (or constant) across generations. Eventually, a system in those conditions can perform well without learning because it can be born knowing to perform well. However, one limitation is that the genome might grow very large to hold large amount of information, and might, as a result, become less easy to evolve further. A second limitation is that such solutions might not perform well in non-stationary environments.

In the second case, that of variable or non-stationary environments, facts cannot be embedded in the genotype because those are changeable as, e.g., the location of food in a foraging problem. This case requires the evolution of learning for the performance to be maximized. For this reason, non-stationary reward-based environments, in which the behaviors to obtain rewards may change, are more typically used to study the evolution of learning in EPANNs.

EPANN experiments have been used to observe the advantages of combining learning and evolution, and the complex interaction dynamics that derives (Nolfi and Floreano, 1999). Stone (2007) showed that distributed neural representations accelerate the evolution of adaptive behavior because learning part of a skill induced the automatic acquisition of other skill components. One study in a non-stationary environment (a foraging problem with variable rewards) (Soltoggio et al., 2007) suggested that evolution discovers, before optimizing, learning in a process that is revealed by discrete fitness stepping stones. At first, non-learning solutions are present in the population. When evolution casually discovers a weak mechanism of learning, it is sufficient to create an evolutionary advantage, so the neural mechanism is subsequently optimized: Fig. 4

shows a sudden jump in the fitness when one agent suddenly evolves a learning strategy: such jumps in fitness graphs are common in evolutionary experiments in which learning is discovered from scratch (Aim 1.1), rather than optimized (Aim 1.2), and were observed as early as in Fontanari and Meir (1991). When an environment changes over time, the frequency of those changes plays a role because it determines the time scales that are required from the learning agent. With time scales comparable to a lifetime, evolution may lead to phenotypic plasticity, which is the capacity for a genotype to express different phenotypes in response to different environmental conditions (Lalejini and Ofria, 2016). The frequency of environmental changes was observed experimentally in plastic neural networks to affect the evolution of learning (Ellefsen, 2014), revealing a complex relationship between environmental variability and evolved learning. One conclusion is that evolution copes with non-stationary environments by evolving the specific learning that better matches those changes.

The use of reward to guide the discovery of neural learning through evolution was shown to be inherently deceptive in Risi et al. (2010) and Lehman and Miikkulainen (2014). In Risi et al. (2009, 2010), EPANN-controlled simulated robots, evolved in a discrete T-Maze domain, revealed that the stepping stones towards discovering learning are often not rewarded by objective-based performance measures. Those stepping stones to learning receive a lower fitness score than more brittle solutions with no learning but effective behaviors. A solution to this problem was devised in Risi et al. (2010, 2009), in which novelty search (Lehman and Stanley, 2008, 2011) was adopted as a substitute for performance in the fitness objective with the aim of finding novel behaviors. Novelty search was observed to perform significantly better in the T-Maze domain. Lehman and Miikkulainen (2014) later showed that novelty search can encourage the evolution of more adaptive behaviors across a variety of different variations of the T-Maze learning tasks. As a consequence, novelty search contributed to a philosophical change by questioning the centrality of objective-driven search in current evolutionary algorithms (Stanley and Lehman, 2015). By rewarding novel behaviors, novelty search validates the importance of exploration or curiosity, previously proposed in Schmidhuber (1991, 2006), also from an evolutionary viewpoint. With the aim of validating the same hypothesis, Soltoggio and Jones (2009) devised a simple EPANN experiment in which exploration was more advantageous than exploitation in the absence of reward learning; to do this, the reward at a particular location depleted itself if continuously visited, so that changing location at random in a T-maze became beneficial. Evolution discovered exploratory behavior before discovering reward-learning, which in turn, and surprisingly, led to an earlier evolution of reward-based learning. Counterintuitively, this experiment suggests that a stepping stone to evolve reward-based learning is to encourage reward-independent exploration.

The seminal work in Bullinaria (2003, 2007a, 2009c) proposes the more general hypothesis that learning requires the evolution of long periods of parental protection and late onset of maturity. Similarly, Ellefsen (2013b, a) investigates sensitive and critical periods of learning in evolved neural networks. This fascinating hypothesis has wider implications for experiments with EPANNs, and more generally for machine learning and AI. It is therefore foreseeable that future EPANNs will have a protected childhood during which parental guidance may be provided (Clutton-Brock, 1991; Klug and Bonsall, 2010; Eskridge and Hougen, 2012).

IV-E Evolving neuromodulation

Growing neuroscientific evidence on the role of neuromodulation (previously outlined in Section II-B) inspired the design of experiments with neuromodulatory signals to evolve control behavior and learning strategies (Section II-C). One particular case is when neuromodulation gates plasticity. Eq. 1 can be rewritten as as

[TABLE]

to emphasize the role of $m$ , a modulatory signal used as a multiplicative factor that can enhance or reduce plasticity (Abbott, 1990). A network may produce many independent modulatory signals $\mathbf{m}$ targeting different neurons or areas of the network. Thus, modulation can vary in space and time. Modulation may also affect other aspects of the network dynamics, e.g., modulating activations rather than plasticity (Krichmar, 2008). Graphically, modulation can be represented as a different type of signal affecting various properties of the synaptic connections of an afferent neuron $i$

(Fig. 5).

Evolutionary search was used to find the parameters of a neuromodulated Hebbian learning rule in a reward-based armed-bandit problem in Niv et al. (2002). The same problem was used later in Soltoggio et al. (2007) to evolve arbitrary learning architectures with a bio-inspired gene representation method called Analog Genetic Encoding (AGE) (Mattiussi and Floreano, 2007). In that study, evolution was used to search both modulatory topologies and parameters of a particular form of Eq. 4:

[TABLE]

where the parameters $A$ to $D$ determined the influence of four factors in the rule: a multiplicative Hebbian term $A$ , a presynaptic term $B$ , a postsynaptic term $C$ , and pure modulatory, or heterosynaptic, term $D$ . Such a rule is not dissimilar from those presented in previous studies (see Section 3.2). However, when used in combination with modulation and a search for network topologies, evolution seems to be particularly effective at solving reward-based problems. Kondo (2007) proposed an evolutionary design and behavior analysis of neuromodulatory neural networks for mobile robot control, validating the potential of the method.

Soltoggio et al. (2008) tested the question of whether modulatory dynamics held an evolutionary advantage in T-maze environments with changing reward locations333In reward-based T-Maze environments, it is often assumed that the fitness function is the sum or all rewards collected during a lifetime.. In their algorithm, modulatory neurons were freely inserted or deleted by random mutations, effectively allowing the evolutionary selection mechanism to autonomously pick those networks with advantageous computational components (Aim 2). After evolution, the best performing networks had modulatory neurons regulating learning, and evolved faster than a control evolutionary experiment that could not employ modulatory neurons. Modulatory neurons were maintained in the networks in a second phase of the experiment when genetic operators allowed for the deletion of such neurons but not for their insertion, thus demonstrating their essential function in maintaining learning in that particular experiment. In another study, Soltoggio (2008b) suggested that evolved modulatory topologies may be essential to separate the learning circuity from the input-output controller, and shortening the input-output pathways which sped up decision processes. Soltoggio (2008a) showed that the learning dynamics are affected by tight coupling between rules and architectures in a search space with many equivalent but different control structures. Fig. 5 also suggests that modulatory networks require evolution to find two essential topological structures: what signals or combination of signals trigger modulation, and what neurons are to be targeted by modulatory signals. In other words, a balance between fixed and plastic architectures, or selective plasticity (DARPA-L2M, 2017), is an intrinsically emergent property of evolved modulated networks.

A number of further studies on the evolution of neuromodulatory dynamics confirmed the evolutionary advantages in learning scenarios (Soltoggio, 2008c). Silva et al. (2012a) used simulations of 2-wheel robots performing a dynamic concurrent foraging task, in which scattered food items periodically changed their nutritive value or became poisonous, similarly to the setup in Soltoggio and Stanley (2012). The results showed that when neuromodulation was enabled, learning evolved faster than when neuromodulation was not enabled, also with multi-robot distributed systems (Silva et al., 2012b). Nogueira et al. (2013, 2016) also reported evolutionary advantages in foraging behavior of an autonomous virtual robot when equipped with neuromodulated plasticity. Harrington et al. (2013) demonstrated how evolved neuromodulation applied to a gene regulatory network consistently generalized better than agents trained with fixed parameter settings. Interestingly, Arnold et al. (2013b) showed that neuromodulatory architectures provided an evolutionary advantage also in reinforcement-free environments, validating the hypothesis that plastic modulated networks have higher evolvability in a large variety of tasks. The evolution of social representations in neural networks was shown to be facilitated by neuromodulatory dynamics in Arnold et al. (2013a). An artificial life simulation environment called Polyworld (Yoder and Yaeger, 2014) helped to assess the advantage of neuromodulated plasticity in various scenarios. The authors found that neuromodulation may be able to enhance or diminish foraging performance in a competitive, dynamic environment.

Neuromodulation was evolved in Ellefsen et al. (2015) in combination with modularity to address the problem of catastrophic forgetting. In Gustafsson (2016), networks evolved with AGE (Mattiussi and Floreano, 2007) for video game playing were shown to perform better with the addition of neuromodulation. Norouzzadeh and Clune (2016) showed that neuromodulation produced forward models that could adapt to changes significantly better than the controls. They verified that evolution exploited variable learning rates to perform adaptation when needed. In Velez and Clune (2017), diffusion-based modulation, i.e., targeting entire parts of the network, evolved to produce task-specific localized learning and functional modularity, thus reducing the problem of catastrophic forgetting.

The evidence in these studies suggests that neuromodulation is a key ingredient to facilitate the evolution of learning in EPANNs. They also indirectly suggest that neural systems with more than one type of signal, e.g., activation and other modulatory signals, might be beneficial in the neuroevolution of learning.

IV-F Evolving indirectly encoded plasticity

An indirect genotype to phenotype mapping means that evolution operates on a compact genotypical representation (analogous to the DNA) that is then mapped into a fully fledged network (analogous to a biological brain). Learning rules may undergo a similar indirect mapping, so that compact instructions in the genome expand to fully fledged plasticity rules in the phenotype. One early study (Gruau and Whitley, 1993) encoded plasticity and development with a grammar tree, and compared different learning rules on a simple static task (parity and symmetry), demonstrating that learning provided an evolutionary advantage in a static scenario. In non-static contexts, and using a T-Maze domain as learning task, Risi and Stanley (2010) showed that HyperNEAT, which usually implements a compact encoding of weight patterns for large-scale ANNs (Fig. 6a), can also encode patterns of local learning rules. The approach, called adaptive HyperNEAT, can encode arbitrary learning rules for each connection in an evolving ANN based on a function of the ANN’s geometry (Fig. 6b). Further flexibility was added in Risi and Stanley (2012) to simultaneously encode the density and placement of nodes in substrate space. The approach, called adaptive evolvable-substrate HyperNEAT, makes it possible to indirectly encode plastic ANNs with thousands of connections that exhibit regularities and repeating motifs. Adaptive ES-HyperNEAT allows each individual synaptic connection, rather than neuron, to be standard or modulatory, thus introducing further design flexibility. Risi and Stanley (2014) showed how adaptive HyperNEAT can be seeded to produce a specific lateral connectivity pattern, thereby allowing the weights to self-organize to form a topographic map of the input space. The study shows that evolution can be seeded with specific plasticity mechanisms that can facilitate the evolution of specific types of learning.

The effect of indirectly encoded plasticity on the learning and on the evolutionary process was investigated by Tonelli and Mouret (2011, 2013). Using an operant conditioning task, i.e., learning by reward, the authors showed that indirect encodings that produced more regular neural structures also improved the general EPANN learning abilities when compared to direct encodings. In an approach similar to adaptive HyperNEAT, Orchard and Wang (2016) encoded the learning rule itself as an evolving network. They named the approach neural weights and bias update (NWB), and observed that increasing the search space of the possible plasticity rules created more general solutions than those based on only Hebbian learning.

V Future directions

The progress of EPANNs reviewed so far is based on rapidly developing theories and technologies. In particular, new advances in AI, machine learning, neural networks and increased computational resources are currently creating a new fertile research landscape, and are setting the groundwork for new directions for EPANNs. This section presents promising research themes that have the potential to extend and radically change the field of EPANNs and AI as a whole.

V-A Levels of abstraction and representations

Choosing the right level of abstraction and the right representation (Bengio et al., 2013) are themes at the heart of many problems in AI. In ANNs, low levels of abstraction are more computationally expensive, but might be richer in dynamics. High levels are faster to simulate, but require an intuition of the essential dynamics that are necessary in the model. Research in EPANNs is well placed to address the problem of levels of abstraction because it can reveal evolutionary advantages for different components, structures and representations.

Similarly to abstractions, representations play a critical role. Compositional Patterns Producing Networks (CPPNs) (Stanley, 2007), and also the previous work of Sims (1991), demonstrated that structured phenotypes can be generated through a function without going through the dynamic developmental process typical of multicellular organisms. Relatedly, Hornby et al. (2002) showed that the different phenotypical representations led to considerably different results in the evolution of regular structures with patterns and repetitions. Miller (2014) discussed explicitly the effect of abstraction levels for evolved developmental learning networks, in particular in relation to two approaches that model development at the neuron level or at the network level.

Finding appropriate abstractions and representations, just as it was fundamental in the advances in deep learning to represent input spaces and hierarchical features (Bengio et al., 2013; Oquab et al., 2014), can also extend to representations of internal models, learning mechanisms, and genetic encodings, affecting the algorithms’ capabilities of evolving learning abilities.

V-B Evolving general learning

One challenge in the evolution of learning is that evolved learning may simply result in a switch among a finite set of evolved behaviors, e.g., turning left or right in a T-Maze in a finite sequence, which is all that evolving solutions encounter during their lifetime. A challenge for EPANNs is to acquire general learning abilities in which the network is capable of learning problems not encountered during evolution. Mouret and Tonelli (2014) propose the distinction between the evolution of behavioral switches and the evolution of synaptic general learning abilities, and suggest conditions that favor these types of learning. General learning can be intuitively understood as the capability to learn any association among input, internal, and output patterns, both in the spatial and temporal dimensions, regardless of the complexity of the problem. Such an objective clearly poses practical and philosophical challenges. Although humans are considered better at general learning than machines, human learning skills are also specific and not unlimited (Ormrod and Davis, 2004). Nevertheless, moving from behavior switches to more general learning is a desirable feature for EPANNs. Encouraging the emergence of general learners may likely involve (1) an increased computational cost for testing in rich environments that include a large variety of uncertain and stochastic scenarios with problems of various complexity, and (2) an increased search space to explore the evolution of complex strategies and avoid deception.

V-C Incremental and social learning

An important open challenge for machine learning in general is the creation of neural systems that can continuously integrate new knowledge and skills without forgetting what they previously learned (Parisi et al., 2018), thus solving the stability-plasticity dilemma. A promising approach is progressive neural networks (Rusu et al., 2016), in which a new network is created for each new task, and lateral connections between networks allow the system to leverage previously learned features. In the presence of time delays among stimuli, actions and rewards, a rule called hypothesis testing plasticity (HTP) (Soltoggio, 2015) implements fast and slow decay to consolidate weights and suggests neural dynamics to avoid catastrophic forgetting. A method to find the best shared weights across multiple tasks, called elastic weight consolidation (EWC) was proposed in Kirkpatrick et al. (2017). Plasticity rules that implement weight consolidation, given their promise to prevent catastrophic forgetting, are likely to become standard components in EPANNs.

Encouraging modularity (Ellefsen et al., 2015; Durr et al., 2010) or augmenting evolving networks with a dedicated external memory component (Lüders et al., 2016) have been proposed recently. An evolutionary advantage is likely to emerge for networks that can elaborate on previously learned sub-skills during their lifetime to learn more complex tasks.

One interesting case in which incremental learning may play a role is social learning (Best, 1999). EPANNs may learn both from the environment and from other individuals, from scratch or incrementally (Offerman and Sonnemans, 1998). In an early study, McQuesten and Miikkulainen (1997) showed that neuroevolution can benefit from parent networks teaching their offspring through backpropagation. When social, learning may involve imitation, language or communication, or other social behaviors. Bullinaria (2017) proposes an EPANN framework to simulate the evolution of culture and social learning. It is reasonable to assume that future AI learning systems, whether based on EPANNs or not, will acquire knowledge through different modalities. These will involve direct experience with the environment, but also social interaction, and possibly complex incremental learning phases.

V-D Fast learning

Animal learning does not always require a myriad of trials. Humans can very quickly generalize from only a few given examples, possibly leveraging previous experiences and a long learning process during infancy. This type of learning, advocated in AI and robotics systems (Thrun and Mitchell, 1995), is currently still missing in EPANNs. Inspiration for new approaches could come from complementary learning systems (McClelland et al., 1995; Kumaran et al., 2016) that humans seem to possess, which include fast and slow learning components. Additionally, approaches such as probabilistic program induction seem to be able to learn concepts in one-shot at a human-level in some tasks (Lake et al., 2015). Fast learning is likely to derive not just from trial-and-error, but also from mental models that can be applied to diverse problems, similarly to transfer learning (Thrun and Mitchell, 1995; Thrun and O’Sullivan, 1996; Pan and Yang, 2010). Reusable mental models, once learned, will allow agents to make predictions and plan in new and uncertain scenarios with similarities to previously learned ones. If EPANNs can discover neural structures or learning rules that allow for generalization, an evolutionary advantage of such a discovery will lead to its full emergence and further optimization of such a property.

A rather different approach to accelerate learning was proposed in Fernando et al. (2008); de Vladar and Szathmáry (2015) and called Evolutionary Neurodynamics. According to this theory, replication and selection might happen in a neural system as it learns, thus mimicking an evolutionary dynamics at the much faster time scale of a lifetime. We refer to Fernando et al. (2012); de Vladar and Szathmáry (2015) for an overview of the field. The appeal of this method is that evolutionary search can be accelerated by implementing its dynamics at both the evolution’s and life’s time scales.

V-E Evolving memory

The consequence of learning is memory, both explicit and implicit (Anderson, 2013), and its consolidation (Dudai, 2012). For a review of computational models of memory see Fusi (2017). EPANNs may reach solutions in which memory evolved in different fashions, e.g., preserved as self-sustained neural activity, encoded by connection weights modified by plasticity rules, stored with an external memory (e.g. Neural Turing Machine), or a combination of these approaches. Recurrent neural architectures based on long short-term memory (LSTM) allow very complex tasks to be solved through gradient descent training (Greff et al., 2015; Hochreiter and Schmidhuber, 1997) and have recently shown promise when combined with evolution (Rawal and Miikkulainen, 2016). Neuromodulation and weight consolidation could also be used to target areas of the network where information is stored.

Graves et al. (2014) introduced the Neural Turing Machine (NTM), networks augmented with an external memory that allows long-term memory storage. NTMs have shown promise when trained through evolution (Greve et al., 2016; Lüders et al., 2016, 2017) or gradient descent (Graves et al., 2014, 2016). The Evolvable Neural Turing Machine (ENTM) showed good performance in solving the continuous version of the double T-Maze navigation task (Greve et al., 2016), and avoided catastrophic forgetting in a continual learning domain (Lüders et al., 2016, 2017) because memory and control are separated by design. Research in this area will reveal which computational systems are more evolvable and how memories will self organize and form in EPANNs.

V-F EPANNs and deep learning

Deep learning has shown remarkable results in a variety of different fields (Krizhevsky et al., 2012; Schmidhuber, 2015; LeCun et al., 2015). However, the model structures of these networks are mostly hand-designed, include a large number of parameters, and require extensive experiments to discover optimal configurations. With increased computational resources, it is now possible to search design aspects with evolution, and set up EPANN experiments with the aim of optimizing learning (Aim 1.2).

Koutník et al. (2014) used evolution to design a controller that combined evolved recurrent neural networks, for the control part, and a deep max-pooling convolutional neural network to reduce the input dimensionality. The study does not use evolution on the deep preprocessing networks itself, but demonstrates nevertheless the evolutionary design of a deep neural controller. Young et al. (2015) used an evolutionary algorithm to optimize two parameters of a deep network: the size (range [1,8]) and the number (range [16,126]) of the filters in a convolutional neural network, showing that the optimized parameters could vary considerably from the standard best-practice values. An established evolutionary computation technique, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) (Hansen and Ostermeier, 2001), was used in Loshchilov and Hutter (2016) to optimize the parameters of a deep network to learn to classify the MNIST dataset. The authors reported performance close to the state-of-the-art using 30 GPU devices.

Real et al. (2017) and Miikkulainen et al. (2017) showed that evolutionary search can be used to determine the topology, hyperparameters and building blocks of deep networks trained through gradient descent. The performance were shown to rival those of hand-designed architectures in the CIFAR-10 classification task and a language modeling task (Miikkulainen et al., 2017), while Real et al. (2017) also tested the method on the larger CIFAR-100 dataset. Desell (2017) proposes a method called evolutionary exploration of augmenting convolutional topologies, inspired by NEAT (Stanley and Miikkulainen, 2002), which evolves progressively more complex unstructured convolutional neural networks using genetically specified feature maps and filters. This approach is also able to co-evolve neural network training hyperparameters. Results were obtained using 5,500 volunteered computers at the Citizen Science Grid who were able to evolve competitive results on the MNIST dataset in under 12,500 trained networks in a period of approximately ten days. Liu et al. (2017) used evolution to search for hierarchical architecture representations showing competitive performance on the CIFAR-10 and Imagenet databases. The Evolutionary DEep Networks (EDEN) framework (Dufourq and Bassett, 2017) aims to generalize deep network optimization to a variety of problems and is interfaced with TensorFlow (Abadi et al., 2016). A number of similar software frameworks are currently being developed.

Fernando et al. (2017) used evolution to determine a subset of pathways through a network that are trained through backpropagation, allowing the same network to learn a variety of different tasks. Fernando et al. (2016) were also able to rediscover convolutional networks by means of evolution of Differentiable Pattern Producing Networks (Stanley, 2007).

So far, EPANN experiments in deep learning have focused primarily on the optimization of learning (Aim 1.2) in supervised classification tasks, e.g. optimizing final classification accuracy. In the future, evolutionary search may be used with deep networks to evolve learning from scratch, recover performance, or combining different learning rules and dynamics in an innovative and counter-intuitive fashion (Aims 1.1, 1.3 or 2 respectively).

V-G GPU implementations and neuromorphic hardware

The progress of EPANNs will crucially depend on implementations that take advantage of the increased computational power of parallel computation with GPUs and neuromorphic hardware (Jo et al., 2010; Monroe, 2014). Deep learning greatly benefited from GPU-accelerated machine learning but also standardized tools (e.g. Torch, Tensorflow, Theano, etc.) that made it easy for anybody to download, experiment, and extend promising deep learning models.

EPANNs have shown promise with hardware implementations. Howard et al. (2011, 2012, 2014) devised experiments to evolve plastic spiking networks implemented as memristors for simulated robotic navigation tasks. Memristive plasticity was observed consistently to enable higher performance than constant-weighted connections in both static and dynamic reward scenarios. Carlson et al. (2014) used GPU implementations to evolve plastic spiking neural networks with an evolution strategy, which resulted in an efficient and automated parameter tuning framework.

In the context of newly emerging technologies, it is worth noting that, just as GPUs were not developed initially for deep learning, so novel neural computation tools and hardware systems, not developed for EPANNs, can now be exploited to enable more advanced EPANN setups.

V-H Measuring progress

The number of platforms and environments for testing the capabilities of intelligent systems is constantly growing, e.g., the Atari or General Video Game Playing Benchmark (GVGAI, 2017), the Project Malmo (Microsoft, 2017), or the OpenAI Universe (OpenAI, 2017). Because EPANNs are often evolved in reward-based, survival, or novelty-oriented environments to discover new, unknown, or creative learning strategies or behaviors, measuring progress is not straightforward. Desired behaviors or errors are not always defined. Moreover, the goal for EPANNs is often not to be good at solving one particular task, but rather to test the capability to evolve the learning required for a range of problems, to generalize to new problems, or to recover performance after a change in the environment. Therefore, EPANNs will require the community to devise and accept new metrics based on one or more objectives such as the following:

•

the time (in the evolutionary scale) to evolve the learning mechanisms in one or more scenarios;

•

the time (in the lifetime scale) for learning in one or more scenarios;

•

the number of different tasks that an EPANN evolves to solve;

•

a measure of the variety of skills acquired by one EPANN;

•

the complexity of the tasks and/or datasets, e.g., variations in distributions, stochasticity, etc.;

•

the robustness and generalization capabilities of the learner;

•

the recovery time in front of high-level variations or changes, e.g., data distribution, type of problem, stochasticity levels, etc.;

•

computational resources used, e.g., number of lifetime evaluations, length of a lifetime;

•

size, complexity, and computational requirements of the solution once deployed;

•

novelty or richness of the behavior repertoire from multiple solutions, e.g., the variety of different EPANNs and their strategies that were designed during evolution.

Few of those metrics are currently used to benchmark machine learning algorithms. Research in EPANNs will foster the adoption of such criteria as wider performance metrics for assessing lifelong learning capabilities (Thrun and Pratt, 2012; DARPA-L2M, 2017) of evolved plastic networks.

VI Conclusion

The broad inspiration and aspirations of evolved artificial plastic neural networks (EPANNs) strongly motivate this field, drawing from large, diverse, and interdisciplinary areas. In particular, the aspirations reveal ambitious and long-term research objectives related to the discovery of neural learning, with important implications for artificial intelligence and biology.

EPANNs saw considerable progress in the last two decades, primarily pointing to the potential of the autonomous evolution and discovery of neural learning. We now have: (i) advanced evolutionary algorithms to promote the evolution of learning, (ii) a better understanding of the interaction dynamics between evolution and learning, (iii) assessed advantages of multi-signal networks such as modulatory networks, and (iv) explored evolutionary representations of learning mechanisms.

Recent scientific and technical progress has set the foundation for a potential step change in EPANNs. Concurrently with the increase of computational power and a resurgence of neural computation, the need for more flexible algorithms and the opportunity to explore new design principles could make EPANNs the next AI tool capable of discovering new principles and systems for general adaptation and intelligent systems.

Acknowledgements

We thank John Bullinaria, Kris Carlson, Jeff Clune, Travis Desell, Keith Downing, Dean Hougen, Joel Lehman, Jeff Krichmar, Jay McClelland, Robert Merrison-Hort, Julian Miller, Jean-Baptiste Mouret, James Stone, Eors Szathmary, and Joanna Turner for insightful discussions and comments on earlier versions of this paper.

Bibliography314

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abadi et al. (2016) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. ar Xiv preprint ar Xiv:1603.04467 , 2016.
2Abbott (1990) Abbott, L. F. Modulation of Function and Gated Learning in a Network Memory. Proceedings of the National Academy of Science of the United States of America , 87(23):9241–9245, 1990.
3Abraham (2004) Abraham, A. Meta learning evolutionary artificial neural networks. Neurocomputing , 56:1–38, 2004.
4Abraham and Robins (2005) Abraham, W. C. and Robins, A. Memory retention–the synaptic stability versus plasticity dilemma. Trends in neurosciences , 28(2):73–78, 2005.
5Alexander and Sporns (2002) Alexander, W. H. and Sporns, O. An Embodied Model of Learning, Plasticity, and Reward. Adaptive Behavior , 10(3-4):143–159, 2002.
6Allis et al. (1994) Allis, L. V. et al. Searching for solutions in games and artificial intelligence . Ponsen & Looijen, 1994.
7Alpaydin (2014) Alpaydin, E. Introduction to machine learning . MIT press, 2014.
8Anderson (2013) Anderson, J. R. The architecture of cognition . Psychology Press, 2013.