Low Barrier Magnet Design for Efficient Hardware Binary Stochastic   Neurons

Orchi Hassan; Rafatul Faria; Kerem Y. Camsari; Jonathan Z. Sun; and; Supriyo Datta

arXiv:1902.03650·cs.ET·April 23, 2019

Low Barrier Magnet Design for Efficient Hardware Binary Stochastic Neurons

Orchi Hassan, Rafatul Faria, Kerem Y. Camsari, Jonathan Z. Sun, and, Supriyo Datta

PDF

TL;DR

This paper investigates low barrier in-plane magnetic anisotropy magnets for hardware binary stochastic neurons, showing they enable faster response times and lower energy consumption than perpendicular magnets, with implications for MRAM-based neural hardware.

Contribution

It provides analytical and simulation-based evidence that in-plane magnets significantly improve the response time and energy efficiency of low barrier magnetic neurons compared to perpendicular magnets.

Findings

01

In-plane magnets have two orders of magnitude smaller correlation times than perpendicular magnets.

02

In-plane magnets enable sub-nanosecond response times and energy consumption of a few femtojoules.

03

Results suggest in-plane magnets are more suitable than perpendicular magnets for low barrier magnetic neuron hardware.

Abstract

Binary stochastic neurons (BSN's) form an integral part of many machine learning algorithms, motivating the development of hardware accelerators for this complex function. It has been recognized that hardware BSN's can be implemented using low barrier magnets (LBM's) by minimally modifying present-day magnetoresistive random access memory (MRAM) devices. A crucial parameter that determines the response of these LBM based BSN designs is the \emph{correlation time} of magnetization, $τ_{c}$ . In this letter, we show that for magnets with low energy barriers ( $Δ \approx k_{B} T$ and below), circular disk magnets with in-plane magnetic anisotropy (IMA) lead to $τ_{c}$ values that are two orders of magnitude smaller compared to $τ_{c}$ for magnets having perpendicular magnetic anisotropy (PMA) and provide analytical descriptions. We show that this striking difference in $τ_{c}$ is due…

Equations20

m_{i} (n + 1) = sgn [tanh I_{i} (n) - r_{i}]

m_{i} (n + 1) = sgn [tanh I_{i} (n) - r_{i}]

E = \frac{1}{2} H_{k p} M_{s} Ω (1 - m_{x}^{2}) + \frac{1}{2} H_{k i} M_{s} Ω (1 - m_{z}^{2})

E = \frac{1}{2} H_{k p} M_{s} Ω (1 - m_{x}^{2}) + \frac{1}{2} H_{k i} M_{s} Ω (1 - m_{z}^{2})

PMA: C (t)

PMA: C (t)

τ_{c}

C(t)=\displaystyle\int_{-1}^{1}dm_{x}\cos(\gamma H_{D}m_{x}t)\rho(m_{x})\bigg{/}\displaystyle\int_{-1}^{1}dm_{x}\rho(m_{x})

C(t)=\displaystyle\int_{-1}^{1}dm_{x}\cos(\gamma H_{D}m_{x}t)\rho(m_{x})\bigg{/}\displaystyle\int_{-1}^{1}dm_{x}\rho(m_{x})

IMA: C (t)

IMA: C (t)

τ_{c}

PMA: I_{P} = \frac{6 q}{ℏ} α k_{B} T

PMA: I_{P} = \frac{6 q}{ℏ} α k_{B} T

IMA: I_{P} = \frac{2 q}{ℏ} \frac{2}{π} H_{D} M_{S} Ω k_{B} T

IMA: I_{P} = \frac{2 q}{ℏ} \frac{2}{π} H_{D} M_{S} Ω k_{B} T

\frac{V _{i} ( t )}{V _{D D} /2} = (\pm) \frac{R _{MTJ} ( t ) - R _{0}}{R _{MTJ} ( t ) + R _{0}}

\frac{V _{i} ( t )}{V _{D D} /2} = (\pm) \frac{R _{MTJ} ( t ) - R _{0}}{R _{MTJ} ( t ) + R _{0}}

\frac{V_{OUT}(t+t_{0})}{V_{OUT0}}\approx{\rm{sgn}}\bigg{[}\mathrm{tanh}\ \frac{V_{IN}(t)}{V_{IN0}}-r(t)\bigg{]}

\frac{V_{OUT}(t+t_{0})}{V_{OUT0}}\approx{\rm{sgn}}\bigg{[}\mathrm{tanh}\ \frac{V_{IN}(t)}{V_{IN0}}-r(t)\bigg{]}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Low Barrier Magnet Design for

Efficient Hardware Binary Stochastic Neurons

††thanks: OH, RF, KYC, SD are with the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 47906 USA. JZS is with IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 USA ††thanks: Manuscript received XX, 201X; revised XX, 201X.

Orchi Hassan1, Rafatul Faria1, Kerem Y. Camsari1, Jonathan Z. Sun2 and Supriyo Datta1

Abstract

Binary stochastic neurons (BSN’s) form an integral part of many machine learning algorithms, motivating the development of hardware accelerators for this complex function. It has been recognized that hardware BSN’s can be implemented using low barrier magnets (LBM’s) by minimally modifying present-day magnetoresistive random access memory (MRAM) devices. A crucial parameter that determines the response of these LBM based BSN designs is the correlation time of magnetization, $\tau_{c}$ . In this letter, we show that for magnets with low energy barriers ( $\Delta\approx k_{B}T$ and below), circular disk magnets with in-plane magnetic anisotropy (IMA) lead to $\tau_{c}$ values that are two orders of magnitude smaller compared to $\tau_{c}$ for magnets having perpendicular magnetic anisotropy (PMA) and provide analytical descriptions. We show that this striking difference in $\tau_{c}$ is due to a precession-like fluctuation mechanism that is enabled by the large demagnetization field in IMA magnets. We provide a detailed energy-delay performance evaluation of previously proposed BSN designs based on Spin-Orbit-Torque (SOT) MRAM and Spin-Transfer-Torque (STT) MRAM employing low barrier circular IMA magnets by SPICE simulations. The designs exhibit sub-ns response times leading to energy requirements of $\sim$ a few fJ to evaluate the BSN function, orders of magnitude lower than digital CMOS implementations with a much larger footprint. While modern MRAM technology is based on PMA magnets, results in this paper suggest that low barrier circular IMA magnets may be more suitable for this application.

Index Terms:

Binary stochastic neuron, hardware implementation, low barrier magnet, embedded MTJ, probabilistic computing

I Introduction

Many inference and machine learning algorithms are based on networks of binary stochastic neurons (BSN’s)[1, 2, 3, 4, 5, 6] each of whose response $m_{i}$ at time step (n+1) is determined by the input $I_{i}$ at time n ( $r_{i}$ : random number between $-$ 1 and $+$ 1):

[TABLE]

In the absence of an input $I_{i}$ the output $m_{i}$ fluctuates randomly between two values $-$ 1 and $+$ 1. A positive $I_{i}(n)$ makes +1 more likely, while a negative $I_{i}(n)$ makes $-$ 1 more likely [7]. Each BSN described by Eq. 1 receives its input from a weighted sum of other BSN’s obtained from a “synapse” ${I_{i}(n)}=\sum_{j}{W_{ij}\ m_{j}(n)}$ . A wide variety of functions can be implemented by properly designing or learning the weights $W_{ij}$ [8, 9, 10].

The BSN function (Eq. 1) is evaluated repeatedly in modern algorithms but they are typically implemented in software. Efforts have been put into developing a suitable hardware for accelerating evaluation of this function, many of which are based on magnetoresistive random access memory (MRAM) technology which is a major contender in the field of non-volatile memory using stable magnets to store information in the form of [math]’s and $1$ ’s. By contrast, BSN’s can be built out of nanomagnets designed to have low energy barriers [11, 12, 13, 14, 15, 16, 17, 18]. The performance of such BSN designs are largely dependent on the magnetization fluctuation rates of the LBM’s, making it important to design the low barrier magnet to have a high fluctuation rate.

Stable magnets could be redesigned to have low energy barriers by scaling the magnetic anisotropy [19]. The energy associated with a magnet is given by

[TABLE]

where, $H_{kp}=2K_{s}/t-4\pi M_{s}$ is the perpendicular anisotropy field along x-axis, $K_{s}$ is the surface anisotrpy density, $H_{ki}$ is the in-plane anisotropy along z-axis, $M_{s}$ is the saturation magnetization and $\Omega$ is the volume of the magnet. Low barrier magnets can be obtained by adjusting the thickness $t$ of perpendicular anisotropy (PMA) magnets so that $H_{kp}\approx 0$ making $\Delta_{PMA}=H_{kp}M_{s}\Omega/2\approx 0$ or by making in-plane anisotropy (IMA) magnet’s shape circular so that $H_{ki}\approx 0$ making $\Delta_{IMA}=H_{ki}M_{s}\Omega/2\approx 0$ . Such magnets with diameters that are less than about 100 nm have been shown to exhibit monodomain behavior [19, 20, 21]. It is important to note that while modifying existing interfacial PMA free layers by modulating the thickness to make them IMA seems relatively straightforward, replacing highly optimized fixed PMA layers [22] with IMA stacks could prove more challenging.

The time scale of fluctuations can be very different for the two categories of low barrier magnets as shown in Fig. 1b and c. In PMA with vanishing perpendicular anisotropy field making $\Delta\rightarrow 0$ , the thermal noise makes the magnetization fluctuate randomly anywhere on the Bloch sphere, while in circular IMA with no preferred easy axis and a large effective demagnetization field ( $H_{D}=4\pi M_{s}$ ) restricts the fluctuations to to a compressed region near the equator (i.e. in-plane moment), making more rapid fluctuations possible.

In this letter, we present a distinction between fluctuation dynamics of low barrier PMA and IMA magnets providing analytical expressions for two very important parameters for performance evaluation of hardware BSN’s: the correlation time $\tau_{c}$ and pinning current $I_{p}$ for $\Delta\approx k_{B}T$ and below. Circular IMA magnets have a correlation time two orders of magnitude smaller compared to PMA and a pinning current that is much higher. We also present a device level performance evaluation on two previously proposed compact BSN designs [23, 24] using circular IMA magnet and show that the sub-ns operation results in only $\sim$ a few fJ of energy requirement for evaluating the BSN function which is orders of magnitude lower than its CMOS implementation [25, 26].

II Low barrier magnets

Binary stochastic neurons could be viewed as a tunable random number generator and a key parameter defining its performance would be the rate at which it produces the random numbers. For an LBM BSN, this rate is related to the magnetization fluctuation rate of the low barrier magnet. The time it takes for the magnet to lose its memory, the correlation time $\tau_{c}$ is defined by the full-width-half-maxima of the temporal auto-correlation function $C(t)$ of magnetization and could be used to characterize the relevant time-scale of operation of BSN.

In low barrier magnets where the energy barrier is well below the thermal energy ( $\Delta\ll k_{B}T$ ) its magnetization becomes a continuous variable. The Arrhenius law which describes the thermal fluctuations of high barrier magnets ( $\Delta\gg k_{B}T$ ) with two distinct magnetic states thus does not hold for LBM [17, 27]. Instead, thermal fluctuations in monodomain low barrier magnets could be characterized starting from Fokker-Planck equation (FPE)[28, 29] or the Landau-Lifshitz-Gilbert (LLG) equation including a Langevin term describing thermal fluctuation [27, 30].

Coffey et. al. [29] analyzes the magnetic fluctuations in a PMA magnet due to thermal noise in detail by using the Fokker-Planck equation (FPE) derived by W. F. Brown [28]. The analysis presented in these references focused on high-barrier magnets but are not limited to it and thus can be evaluated for $\Delta\rightarrow 0$ to describe the low barrier magnet dynamics of PMA magnets which agree well with numerical results.

[TABLE]

In low barrier circular IMA magnets when thermal noise kicks the magnetization out-of-plane, due to absence of an easy axis and the presence of large orthogonal demagnetization field $H_{D}$ the in-plane magnetization starts precessing. If we consider an ensemble of such magnets each with a different precession frequency due to thermal noise, the average magnetization vector would quickly dissipate. The auto-correlation function of the in-plane magnetization $m_{z}=\cos(\phi(t))$ could be expressed as:

[TABLE]

where the in-plane precession dynamics is described by $\phi(t)\approx\gamma H_{D}m_{x}t$ [30] for low damping $\alpha$ . The perpendicular magnetization $m_{x}$ follows a Boltzmann distribution with $\rho(m_{x})\approx\exp(-H_{D}M_{S}\Omega m_{x}^{2}/2k_{B}T)$ . For large values of $H_{D}$ the integral could be extended to $\pm\infty$ and evaluated to give an expression for the auto-correlation function and correlation time as follows:

[TABLE]

In numerical simulations, we observe essentially the same auto-correlation behavior, even when the correlation function is obtained from the time-dependent fluctuations of a single magnet fluctuating for long time periods as shown in Fig. 2a. In PMA no such precessional fluctuation mechanism exists as the internal fields are compensated.

Another important parameter for evaluating an LBM based stochastic device performance is itds sensitivity to spin current. To maintain stochasticity in MRAM type devices, they should be immune to read current, and the amount of current required to bias BSN devices is also relevant for power considerations. In high barrier magnets the concept of switching current is presented [31], for low barrier magnets we refer to pinning currents as the relevant quantity which can be mathematically defined as: $I_{P}=(\langle m\rangle/I_{S})^{-1}$ as shown in Fig. 3. The pinning currents for PMA can be derived from steady-state Fokker-Planck equation as described in Ref. [32], while for IMA magnets with $\Delta\rightarrow 0$ and low damping, the pinning current can be approximated from the relation $I_{P}\equiv{qN_{S}C(0)}\big{/}{\int_{0}^{\infty}dtC(t)}$ . Fig. 3 shows that the numerical results are well described by the obtained expressions:

[TABLE]

The derivation of Eq. 4 and Eq. 5 assume zero energy barriers, but numerically we observe that these equations are approximately valid for barriers up to $\Delta\approx k_{B}T$ . In practice obtaining near-zero barrier circular magnets could be challenging due to process variation. For interconnected networks of p-bits, a distribution of correlation times for each p-bit needs to be considered as shown in Ref.[33].

Note that IMA-based designs can achieve sub-nanosecond correlation times even with fairly large volumes, provided that monodomain behavior can be preserved with a small enough diameter, while PMA-based designs tend to be much slower making IMA magnets more suitable for BSN applications. This is accompanied by fairly large pinning currents for IMA compared to PMA which minimizes read disturb effects.

In the following section for the performance evaluation of two LBM based hardware BSN designs we used circular IMA magnets M1 and M2 with volumes $800\pi$ and $20480\pi\ \rm nm^{3}$ , respectively.

III Performance Evaluation of Hardware BSN using circular IMA LBM

In this section we evaluate the steady-state and time response of two hardware BSN designs proposed in the past [23, 24] shown in Fig. 4 and measure the energy and delay associated with each.

The designs makes use of a magnetic tunnel junction (MTJ) whose free layer is a low barrier magnet with a fluctuating magnetization $m_{z}(t)$ , resulting in a fluctuating resistance, $R_{\rm{MTJ}}(t)^{-1}=G_{0}[1+m_{zi}(t)\rm{TMR}/(2+\rm{TMR})]$ where $G_{0}$ is the average conductance and TMR is the tunneling magnetoresistance. The fluctuating resistance $R_{MTJ}(t)$ is converted to a fluctuating voltage $V_{i}$ (t) by the potential divider:

[TABLE]

The fluctuations are controlled by two different mechanisms in the two designs. BSN-A is a spin-orbit-torque controlled device [23] which uses the input spin current (in y direction) from the GSHE layer to pin the free layer magnetization (in z direction) of the MTJ thereby pinning $R_{MTJ}$ and implements ( $+$ ) configuration of Eq. 6. BSN-B is a series resistance controlled device [24] which uses the input voltage to control the transistor resistance $R_{0}$ and implements the ( $-$ ) configuration of Eq. 6. Ideally $R_{MTJ}$ remains unchanged, though in actual designs it may be important to consider unintended pinning effects of the current. Both designs use a minimum sized CMOS inverter to convert the fulctuating $V_{i}$ into a rail-to-rail output $V_{OUT}$ . In each case we will use SPICE simulations based on state-of-the-art stochastic Landau-Lifshitz-Gilbert (s-LLG) models for LBM’s [34] free layer of the MTJ having $G_{0}\simeq(25K\Omega)^{-1}$ and $\mathrm{TMR}=2P^{2}/(1-P^{2})=110\%$ with polarization $P\simeq 0.6$ coupled with 14 nm HP FinFET’s [35] to show that the output voltage $V_{OUT}$ from a specific BSN is approximately related to its input $V_{IN}$ by an equation that mimics Eq. 1 :

[TABLE]

with scaling factors $V_{OUT0},V_{IN0},t_{0}$ characterizing the specific hardware design.

III-A Steady-State Response

Fig. 5 shows the individual steady state response of design A,B using magnet M1 and M2, which can all collapse onto the same curve using appropriate scaling parameters. The output scaling quantity $\rm{V_{OUT0}\simeq V_{DD}/2=0.4V}$ is the same for all cases as this quantity is defined entirely by CMOS inverter output voltage swing. On the other hand, the input scaling parameters are very design dependent. For BSN-A $\rm{I_{IN0}}$ is determined by pinning currents of magnets M1 and M2. Indeed, the scaling parameters in Fig. 5b were obtained from Eq. 5. For BSN-B $\rm{V_{IN0}\sim 50mV}$ for both magnets, determined by transistor characteristics. Note that the SPICE simulations include the read disturb current, but its effect is minimal due to the high pinning currents of low barrier IMA compared to PMA as can be seen from Eq. 4 and Eq. 5.

III-B Time Response

Fig. 6 shows the two relevant timescales associated with BSN operation. First is the correlation time of the output voltage which is determined by the magnet parameters. Indeed, the FWHM of the autocorrelation function corresponds well to Eq. 3, which is expected since circuit related times are much shorter in this case. Second is the response time which is very design dependent. For BSN-A it is determined by magnet physics while for BSN-B it is determined by transistor physics [36]. Our analysis shows that the response time $\rm{t_{0}}$ of a single BSN-B neuron is independent of magnet parameters. However, the response of an interconnected network of such neurons would also involve the magnet correlation time $\tau_{c}$ .

III-C Power Consumption

Fig. 7 shows the power drawn from the sources $\pm V_{DD}/2$ individually by the MTJ branch and the inverter branch as $V_{IN}$ is stepped at $t=0$ from different initial to final values as indicated. The steady-state values of the power dissipated in both the MTJ and inverter branches agree quantitatively with the simple estimate (see dashed line in Figures) from $V_{DD}^{2}/R$ , where R is the appropriate resistance, namely $R_{MTJ}+R_{0}$ for the MTJ branch, and $R_{\rm NMOS}+R_{\rm PMOS}$ for the inverter branch. For the MTJ branch, the power dissipated is $\sim$ 10-20 $\mu W$ for all cases except in the middle panel for BSN-B. In this case the final state involves a large negative input voltage $V_{IN}$ for which the series transistor is turned OFF, making the resistance R extremely large, so that $V_{DD}^{2}/R\rightarrow 0$ . In all other cases, the total R is of the order of the MTJ resistance $\sim 25K\Omega$ , so that $V_{DD}^{2}/R\sim 25\mu W$ . For the inverter branch, BSN-A dissipates $\sim$ 10 $\mu W$ since the voltage at the inverter input in all cases remains close to the threshold value making both NMOS and PMOS branches fairly conducting. On the other hand, for BSN-B, PMOS and NMOS get turned off for large positive and for large negative input $V_{IN}$ respectively, making the effective R very large. Only for input voltages $\sim 0$ , both PMOS and NMOS branches are conducting, giving rise to a steady-state power $\sim 10\mu W$ like BSN-A. This number could be lowered if we can engineer larger voltage fluctuations at the inverter input, $\lvert\delta V_{i}\rvert\sim\ P^{2}V_{DD}/(4-P^{4})$ . Our assumed TMR of $110\%$ corresponds to $P\sim 0.6$ , giving a $\lvert\delta V_{i}\rvert\sim 75\ mV$ .

Note that in this analysis the power drawn from $V_{IN}$ is not considered which is expected to be very different for a low input impedance design (BSN-A) compared to a high input impedance design (BSN-B) and will depend on the driving mechanism and circuitry. Overall, both designs suffer from significant steady-state power losses and would need to be turned off when not in use. This can be done straightforwardly for BSN-B using a large negative input voltage $V_{IN}$ . The key point to note is that the energy dissipated during the evaluation of the BSN function is $\sim 20\ \mu W\times 50\ ps=$ 1 fJ which is orders of magnitude smaller than CMOS implementations of the same function [25, 26] as noted earlier from system level simulations in [37]. The device level analysis presented here elucidates the role of proper magnet design for achieving the subnanosecond response times that is crucial for fast and low energy operation. The analysis also suggests low barrier IMA magnet as a more suitable candidate for BSN type applications due to its fast fluctuation dynamics, while modern non-volatile MRAM technology is largely based on PMA magnets [38].

Acknowledgment

This work was supported in part by the Center for Probabilistic Spin Logic for Low-Energy Boolean and Non-Boolean Computing (CAPSL), one of the Nanoelectronic Computing Research (nCORE) Centers as task 2759.005, a Semiconductor Research Corporation (SRC) program sponsored by the NSF through ECCS 1739635.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for boltzmann machines,” Cognitive science , vol. 9, no. 1, pp. 147–169, 1985. [Online]. Available: https://doi.org/10.1016/S 0364-0213(85)80012-4
2[2] D. J. Amit and D. J. Amit, Modeling brain function: The world of attractor neural networks . Cambridge university press, 1992. [Online]. Available: https://doi.org/10.1016/0166-2236(90)90155-4
3[3] Binary stochastic neurons in tensorflow (https://r 2rt.com/binary-stochastic-neurons-in-tensorflow.html).
4[4] A. Alaghi and J. P. Hayes, “Survey of stochastic computing,” ACM Transactions on Embedded computing systems (TECS) , vol. 12, no. 2s, p. 92, 2013. [Online]. Available: https://doi.org/10.1145/2465787.2465794
5[5] S. K. Esser, A. Andreopoulos, R. Appuswamy, P. Datta, D. Barch, A. Amir, J. Arthur, A. Cassidy, M. Flickner, P. Merolla et al. , “Cognitive computing systems: Algorithms and applications for networks of neurosynaptic cores,” in Neural Networks (IJCNN), The 2013 International Joint Conference on . IEEE, 2013, pp. 1–10. [Online]. Available: https://doi.org/10.1109/IJCNN.2013.6706746
6[6] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura et al. , “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science , vol. 345, no. 6197, pp. 668–673, 2014. [Online]. Available: https://doi.org/10.1126/science.1254642
7[7] Note that we are using a bipolar representation ± plus-or-minus \pm 1 instead of the binary representation (0,1). This is reflected in the use of the tanh function in Eq. 1 instead of the usual logistic function.
8[8] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale memristor device as synapse in neuromorphic systems,” Nano letters , vol. 10, no. 4, pp. 1297–1301, 2010. [Online]. Available: http://doi.org/10.1021/nl 904092 h