Nonvolatile Spintronic Memory Cells for Neural Networks

Andrew W. Stephan; Qiuwen Lou; Michael Niemier; X. Sharon Hu and; Steven J. Koester

arXiv:1905.12679·cs.ET·May 31, 2019

Nonvolatile Spintronic Memory Cells for Neural Networks

Andrew W. Stephan, Qiuwen Lou, Michael Niemier, X. Sharon Hu and, Steven J. Koester

PDF

TL;DR

This paper introduces a novel nonvolatile spintronic memory cell designed for neural networks, demonstrating improved energy efficiency and performance in image classification tasks through simulation-based evaluation.

Contribution

It proposes a new spintronic memory cell architecture and a dual-circuit neural network design that leverages these devices for efficient neural computing.

Findings

01

Spintronic cells outperform charge-based counterparts in energy efficiency.

02

The proposed architecture achieves about 100 pJ per image processing.

03

Simulations show effective classification performance with varying nanomagnet parameters.

Abstract

A new spintronic nonvolatile memory cell analogous to 1T DRAM with non-destructive read is proposed. The cells can be used as neural computing units. A dual-circuit neural network architecture is proposed to leverage these devices against the complex operations involved in convolutional networks. Simulations based on HSPICE and Matlab were performed to study the performance of this architecture when classifying images as well as the effect of varying the size and stability of the nanomagnets. The spintronic cells outperform a purely charge-based implementation of the same network, consuming about 100 pJ total per image processed.

Figures10

Click any figure to enlarge with its caption.

Figure 9

Figure 10

Tables1

Table 1. Table 1: Simulation Parameters

Symbol	Quantity	Value
$K$	crystalline anisotropy	$4.5 \cdot 10^{4}$ J/m³
$V$	ferromagnet volume	1530 - 5400 nm³
$Δ$	thermal stability factor	16 - 59 kT
$η$	spin injection efficiency	0.8[17, 18, 19]
$M_{S}$	saturated magnetization	1.7 MA/m
$α$	Gilbert damping	0.01
$C$	ME capacitance	1 fF
$α_{M E}$	ME coefficient	10/c*[9, 20]
$λ$	spin conversion length	6 nm[21, 22, 23]
$ρ$	IR material resistivity	10 m $Ω$ $\cdot$ cm[24]
$R_{I R}$	IR source resistance	20 k $Ω$ [6]
$V_{D}$	neuron drive voltage	$\leq$ 100 mV
$V_{S}$	synapse supply voltage	$\pm$ 500 mV
$V_{T}$	transistor threshold	0.2 V
*c is the speed of light.

Equations4

\displaystyle\frac{d\boldsymbol{m}}{dt}=-\gamma\mu_{0}\Big{(}(\boldsymbol{m}\times\boldsymbol{H_{Eff}})-\alpha\big{(}\boldsymbol{m}\times(\boldsymbol{m}\times\boldsymbol{H_{Eff}})\big{)}\Big{)},

\displaystyle\frac{d\boldsymbol{m}}{dt}=-\gamma\mu_{0}\Big{(}(\boldsymbol{m}\times\boldsymbol{H_{Eff}})-\alpha\big{(}\boldsymbol{m}\times(\boldsymbol{m}\times\boldsymbol{H_{Eff}})\big{)}\Big{)},

V_{1} = V_{D} \frac{R _{I R}}{R _{F M} + R _{g n d}} \frac{η}{λ} .

V_{1} = V_{D} \frac{R _{I R}}{R _{F M} + R _{g n d}} \frac{η}{λ} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Nonvolatile Spintronic Memory Cells for Neural Networks

Andrew W. Stephan

Qiuwen Lou

Michael Niemier

X. Sharon Hu

\IEEEmembershipFellow, IEEE and Steven J. Koester

\IEEEmembershipFellow, IEEE Manuscript submitted . This work was supported by Seagate Technology PLC.The authors acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for providing resources that contributed to the research results reported within this paper. URL: http://www.msi.umn.eduA. W. Stephan is with the College of Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail:[email protected]).Q. Lou is with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:[email protected]).M. Niemier is with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:[email protected]).X. Sharon Hu is with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail:[email protected]).S. J. Koester is with the College of Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail:[email protected]).

Abstract

A new spintronic nonvolatile memory cell analogous to 1T DRAM with non-destructive read is proposed. The cells can be used as neural computing units. A dual-circuit neural network architecture is proposed to leverage these devices against the complex operations involved in convolutional networks. Simulations based on HSPICE and Matlab were performed to study the performance of this architecture when classifying images as well as the effect of varying the size and stability of the nanomagnets. The spintronic cells outperform a purely charge-based implementation of the same network, consuming $\approx$ 100 pJ total per image processed.

{IEEEkeywords}

Cellular Neural Network, Convolutional Neural Network, Spintronics, CMOS, Magnetoelectric, Rashba-Edelstein, MNIST, Nonvolatile Memory.

1 Introduction

A hardware implementation of feed-forward neural networks must incorporate three basic functionalities: a dot-product engine that can be used to compute convolution and fully-connected layer operations, memory elements that can be used to store intermediate inter- and intra-layer results, and components that can compute some non-linear activation function. Many purely charge-based implementations of these, with varying levels of efficiency, have been proposed. The dot-product has been successfully implemented in hardware in various ways[1, 2, 3, 4]. Using these dot-product circuits, a cellular neural network-based (CeNN) convolutional neural network (CoNN) accelerator was designed in[5]. We propose a CeNN cell based on recently proposed spintronic elements with a high energy-efficiency that can be used as analog memory with a built-in activation function. The performance of these cells is simulated in a CeNN-accelerated CoNN performing image classification based on [5]. The spintronic cells significantly reduce the energy and time consumption relative to their charge-based counterparts, needing only $\approx$ 100 pJ and $\approx$ 42 ns to compute all but the final fully-connected CoNN layer while maintaining a high accuracy.

2 IRMEN

2.1 IRMEN Neurons

The Inverse Rashba-Edelstein Magnetoelectric Neuron (IRMEN) is thoroughly described, including its relationship to standard CeNN cells, in [6]. A brief summary will be given here. The cell bears some resemblance to the standard cell of a cellular neural network in that it is based around a capacitor (see Fig. 1)[7]. However the capacitor represents an input mechanism rather than the true state. A magnetoelectric material within the capacitor is coupled to the ferromagnet that makes up one of its electrodes, thereby allowing control of the ferromagnet via electric charge on the capacitor[8, 9, 10, 11, 6]. Readout is accomplished by driving a charge current first through the ferromagnet to spin-polarize the current and then through an inverse spin-orbit stack that transduces from spin current to charge potential along an axis orthogonal to both current flow and spin orientation[12, 13, 14]. This is modeled as a voltage source with the inverse Rashba potential $V_{IR}$ and internal resistance $R_{IR}$ [6]. The IRMEN cells natively compute a nonlinear transfer function on their inputs by virtue of the M-H curve dictated by the anisotropy of the nanomagnet. This transfer function can be tuned by varying the anisotropy characteristics. Example transfer functions are shown in Fig. 2 assuming uniaxial crystalline anisotropy along the length dimension.

2.2 IRMEN Nonvolatile Memory

Memory is a crucial component to any hardware-based implementation of a neuromorphic computing scheme. In anything more complex than a simple fully-connected network, such as a CoNN, the input values to any given layer need to be referenced multiple times[15]. This necessitates the inclusion of some form of analog memory or digital memory accompanied by analog-to-digital converters (ADCs) and the reverse (DACs). The simplest solution is to use simple 1T DRAM cells. However one of the issues from which standard 1T DRAM suffers is the loss of charge upon read. If the state is not refreshed the leakage of repeated read cycles will degrade the stored state. The device discussed above, besides being able to realize neuromorphic CeNN-like operations, can also function as a memory. Because it serves dual purpose as a neural computing unit and analog memory, the IRMEN offers enhanced efficiency. Here the state is stored on the capacitor. The IRMEN readout mechanism, crucially, does not involve accessing the charge stored on the cell capacitor. This charge may remain safely locked in place while the cell state is read out through injection of current into the ferromagnet (FM). While this does temporarily disturb the actual electrical field across the capacitor and therefore the actual steady-state magnetization, a sufficiently swift read pulse will ensure the read completes while the FM is just beginning to move depending on the relative time-scales of electrical and magnetic motion. More importantly, the cell will return to the same state after any perturbations caused by the read without any loss of information. Subsequent reads will obtain the same actual value as the first. The perturbation during read, even if non-negligible, will be consistent and can be accounted for. We note that the intrinsic write energy of the IRMEN memory is equal to the charging energy of the magnetoelectric capacitor. Using values from Table 1, we estimate the write energy to be no more than 10 aJ, putting it on par with typical 1T DRAM values. This value scales with the capacitor area. This estimate does not include the energy dissipated by external circuitry during the write process. Again referencing Table 1 we estimate the total write energy at between 0.24 and 0.79 fJ for one cell. While this is higher than some other nonvolatile memories such as STT-RAM, this is compensated for by the analog, as opposed to digital, nature of the IRMEN cell as well as its ability to function as a neural computing unit in addition to memory[16].

A basic simulated demonstration of the functionality of the IRMEN memory cell is shown in Fig. 3. The magnetization is switched between two different levels, held for an extended period of 12 ns by charge stored on the capacitor, and switched again. The drive current is pulsed for periods of 200 ps to generate a readout potential at various points throughout the simulation. This indicates the ability of the IRMEN to store a value, encoded as charge, and provide a transformed readout value on command. We note that there is some noise in the readouts visible in Fig. 3 (d). In order to quantify this error we performed more exhaustive simulations. For nanomagnets of several different sizes, Monte Carlo simulations of 12300 iterations were performed. In each iteration an input current was applied and the neuron was then prompted to provide a readout. The results approximated a saturated linear transfer function as in the ideal CeNN cell[7] with some error. After normalizing according to the base input current and output voltage values as described in Sec. 4, the error is calculated as the difference between an iealization of the transfer function and the actual readout. The average magnitude of the normalized error is shown in Fig. 4 (a). Histograms indicating the range of errors for the largest and smallest nanomagnets are shown in Fig. 4 (b), displaying the significant increase in thermal broadening with smaller and more unstable magnets.

The following section will describe a network of IRMEN cells for CoNN implementation.

3 Convolutional Networks with IRMEN Cells

Lou et al. proposed using standard, purely charge-based CeNN cells with weighted, programmable Operational Transconductance Amplifier-based current sources (OTAs) to produce an in-hardware implementation of a convolutional neural network [5, 1]. Using CeNN weight template schemes, the Rectified Linear Unit (ReLU) activation function and pooling function were approximated so that all but the fully-connected output layer of a CoNN could be implemented via simple CeNN cells. We propose to improve on the purely charge-based CeNN/CoNN model by replacing the purely charge-based cells with a more energy-efficient IRMEN cell. The CoNN structure to be emulated is shown in Fig. 5.

3.1 Coupled CeNNs for Memory and Computing

To fully leverage the IRMEN capabilities we propose a dual-functionality operation mode. Each stage corresponds to one CeNN operation, represented by a specific CeNN template imposed upon the data as it is slowly transformed from the initial input to the final output. Each stage of computation is implemented by one of a pair of identical IRMEN CeNNs similar to the structure in [5]. One of the pair provides the input, obtained from the previous stage and locally stored, to the other of the pair for processing and subsequent storing (see Fig. 6).

The proposed dual-CeNN design allows us to do away with a dedicated memory module for neuron state storage and most of the associated peripheral circuits such as ADCs and DACs. Weight storage must still be implemented, although additional IRMEN memory cells could be included for this purpose. We expect significant savings in operational energy and possibly also speed as a result. Although the delay between stages must be on the order of nanoseconds to allow sufficient time for the magnetizations to respond to the charge placed on the capacitors, each stage only needs to be powered for about 130 ps, which is the estimated combined delay of the OTAs[1] and other electrical components. The total operational delay between each stage is 1.5 ns, which is mostly due to the magnetic switching delay as previously mentioned.

4 Simulation

This section will describe the methods by which we simulate IRMEN CeNNs emulating an image classification CoNN as previously described. The results of this simulation as measured by the predicted accuracy and energy cost will also be presented herein.

4.1 Simulation Setup

In order to evaluate the utility of the proposed IRMEN-based architecture we tested it on the MNIST handwritten digits dataset. The structure of the CoNN to be implemented via IRMEN CeNNs is shown in Fig. 5, is identical to one of the test cases in [5]. In the interest of time the training of the weights was implemented using TensorFlow with a custom transfer function representing a close numerical approximation to the hysteresis curve of the IRMENs’ nanomagets. The trained weights were then given to a Matlab simulator which used the fourth-order Runge-Kutta method to simultaneously solve the Landau-Lifshitz-Gilbert equation and the electrical circuit equations associated with each neuron. The IRMEN magnetodynamical equations used were identical to those of [6]. The FM is modeled using the macrospin approximation. The Landau-Lifshitz-Gilbert equation relates the motion of the unit magnetization $\boldsymbol{m}$ to the net field

[TABLE]

where $\gamma$ is the gyromagnetic ratio, $\mu_{0}$ is the vacuum permeability, $\alpha$ is the Gilbert damping and $\boldsymbol{H_{Eff}}$ is the effective field, calculated according to the methods in [6]. The simulation parameters are given in Table 1.

In order to map from the Tensorflow network to an IRMEN simulation we must translate the arbitrary state and weight values into electrical values associated with the IRMEN cells. A neuron state value can vary between $\pm$ 1, depending on the magnetization of the associated nanomagnet, and is read by measuring $V_{IR}$ . This value depends on the driving potential in addition to the state of the nanomagnet. We define the maximum output potential $V_{1}$ to be the neuron state of 1:

[TABLE]

The normalized output of a neuron is $V_{IR}/V_{1}$ . This voltage is applied to the input of an OTA which consequently produces up to 1 $\mu$ A of current to inject through the shunt resistor of a subsequent cell (see Fig. 1). The normalized input to a neuron is thus $I_{IN}/(10^{-6})$ . We use a similar two-stage OTA design as [5]. The first stage in the OTA is a differential input pair. The second stage is a current mirror that subtracts two branches of current to obtain a single ended output while providing a large output impedance. The ratio of the current mirrors between two stages is set to 2 to save power in the first stage of the OTA. We use multiple OTAs to represent different numbers of bits for weights by power gating. By reprogramming these OTAs, the desired weights are achieved in each step.

4.2 Image Classification Results

The simulated CoNN with IRMEN-based convolution, activation and max-pooling templates was tested against 10000 images withheld from the training stage and achieved high image classification accuracy (see Fig. 7). The energy required to perform all the steps involved in the classification of each image, except the final fully-connected layer, was estimated, including cycling the OTA and weight storage access transistors, cycling the OTA gate capacitors, powering the OTAs themselves and driving the current through the IRMEN cells. This total varied depending on IRMEN FM length, as greater lengths require larger magnetoelectric fields to saturate the FM along the competing axis (see Fig. 2). The relationship between energy and length is shown in Fig. 7 (a). The classification accuracy is plotted against FM length in Fig. 7 (b) showing that greater thermal stability improves overall network performance. However this comes at the cost increased energy and area. The correlation between accuracy and energy expenditure is displayed in Fig. 7 (c). The origin of this increased accuracy is the reduced error in the saturated linear transfer function computed by each IRMEN cell (see Fig. 2). The classification accuracy is plotted directly against the mean transfer function error magnitude in Fig. 7 (d). We also consider the effect of using weights with limited resolution. Although the specific weight storage mechanism is not considered here, it is reasonable to assume a limited representation accuracy. Thus in Fig. 8 we show the image classification accuracy vs. FM length with the number of bits used to encode a weight value as a parameter, starting with 4-bit precision. Although lowering the precision to 4 bits is detrimental the performance is still quite good for lengths above 30 nm.

Compared to the previously-implemented purely charge-based version, using the IRMEN cells provides a significant energy and time savings. According to Table 3 of [5] the charge-based CeNN requires over 12 nJ to compute all convolution, pooling and activation stages, while the IRMEN CeNN needs less than 0.14 nJ. We also note that the Convolution/ReLU and Pooling layers require two and twelve individual CeNN stages, respectively, each with a delay of 1.5 ns (see Sec. 3 A.) so the overall IRMEN CeNN/CoNN delay is 42 ns. The CeNN in [5] takes 240.5 ns to perform the same computations.

5 Conclusion

With the growing importance of neuromorphic computing and beyond-CMOS computation, the search for new devices to fill these roles in crucial. We have proposed a novel magnetoelectric analog memory element with a built-in transfer function that also allows it to act as the cell in a CeNN. Simulations of a CeNN-friendly CoNN implemented with these IRMEN cells predict that highly accurate and low-power image classification can be achieved with this device. These results are superior to those predicted for purely charge-based implementations of the same network. This clearly demonstrates the benefits of applying spintronics to neurmorphic computing.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Qiuwen, I. Palit, A. Horvath, X. Hu, M. Niemier and J. Nahas, “TFET-Based Operational Transconductance Amplifier Design for CNN Systems,” Proc. of 25th Edition on Great Lakes Symp. on VLSI, vol. 20-22, pp.277-282 (2015).
2[2] I. Nahlus, E. P. Kim, N. R. Shanbhag and D. Blaauw, “Energy-Efficient Dot Product Computation Using a Switched Analog Circuit Architecture,” 2014 IEEE/ACM Int. Symp. on Low Power Electronics and Design, vol. 2015, pp. 315-318.
3[3] N. C. Wang, S. J. Gonugondla, I. Nahlus, N. R. Shanbhag and E. Pop, “GDOT: A Graphene-Based Nanofunction for Dot-Product Computation,” 2016 IEEE Symp. VLSI Tech., vol. 2016, pp. 1-2.
4[4] A. Sengupta and K. Roy, “Encoding Neural Functionalities in Electron Spin: A Pathway to Efficient Neuromorphic Computing,” Dec. 2017, ar Xiv:1711.02235 v 4.
5[5] Q. Lou, C. Pan, J. Mc Guinness, A. Horvath, A. Naeemi, M. Niemier and X. S. Hu, “A Mixed Signal Architecture for Convolutional Neural Networks,” Oct. 2018, ar Xiv:1811.02636 v 1.
6[6] A. Stephan, J. Hu and S. J. Koester, “Performance Estimate of Inverse Rashba-Edelstein Magnetoelectric Devices for Neuromorphic Computing,” IEEE JXCDC Mar. 2019, DOI:10.1109/JXCDC.2019.2903286.
7[7] L. Chua and L. Yang, “Cellular Neural Networks: Theory,” IEEE Trans. on Circ. and Syst., vol. 35, no. 10, pp. 1257–1272, Oct. 1988, DOI: 10.1109/31.7600.
8[8] J. T. Heron, M. Trassin, K. Ashraf, M. Gajek, Q. He, S. Y. Yang, D. E. Nikonov, Y.-H. Chu, S. Salahuddin, and R. Ramesh, “Electric-Field-Induced Magnetization Reversal in a Ferromagnetic Multiferroic Heterostructure,” Phys. Rev. Lett., vol. 107, no. 21, pp. 217202–1–4, Nov. 2011, DOI: 10.1103/Phys Rev Lett.107.217202.