Clockless Spin-based Look-Up Tables with Wide Read Margin
Soheil Salehi, Ramtin Zand, Ronald F. DeMara

TL;DR
This paper introduces a novel clockless spin-based lookup table using SHE-MTJ technology, achieving wider read margins, error-free operation under variations, and significant reductions in power and area compared to traditional SRAM and STT-MTJ LUTs.
Contribution
It presents a new fracturable 6-input non-volatile C-LUT with differential MTJ design, offering improved read margin, error resilience, and lower power and area consumption.
Findings
No read/write errors under process variations
5.4-fold reduction in standby power
Area reduction of 1.3x over SRAM and 2x over STT-MTJ
Abstract
In this paper, we develop a 6-input fracturable non-volatile Clockless LUT (C-LUT) using spin Hall effect (SHE)-based Magnetic Tunnel Junctions (MTJs) and provide a detailed comparison between the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-based C-LUT. The proposed C-LUT offers an attractive alternative for implementing combinational logic as well as sequential logic versus previous spin-based LUT designs in the literature. Foremost, C-LUT eliminates the sense amplifier typically employed by using a differential polarity dual MTJ design, as opposed to a static reference resistance MTJ. This realizes a much wider read margin and the Monte Carlo simulation of the proposed fracturable C-LUT indicates no read and write errors in the presence of a variety of process variations scenarios involving MOS transistors as well as MTJs. Additionally, simulation results indicate that the…
| Power () | Delay | |||||
|---|---|---|---|---|---|---|
| Read | Write | Standby | Read | Write | ||
| SRAM LUT | Logic “0” | 2.58 | 28.4 | 1.5 | 30 ps | 20 ps |
| Logic “1” | 7.55 | 27.7 | 1.85 | 30 ps | 20 ps | |
| Average | 5.06 | 25.08 | 1.67 | 30 ps | 20 ps | |
| MRAM C-LUT | Logic “0” | 14.38 | 81.16 | 0.31 | 20 ps | 2 ns |
| Logic “1” | 19.91 | 81.25 | 0.31 | 60 ps | 2 ns | |
| Average | 17.15 | 81.18 | 0.31 | 40 ps | 2 ns | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic properties of thin films · Semiconductor materials and devices · Advancements in Semiconductor Devices and Circuit Design
Clockless Spin-based Look-Up Tables with Wide Read Margin
Soheil Salehi, Ramtin Zand, Ronald F. DeMara
Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, 32816 USA
(2019)
Abstract.
In this paper, we develop a -input fracturable non-volatile Clockless LUT (C-LUT) using spin Hall effect (SHE)-based Magnetic Tunnel Junctions (MTJs) and provide a detailed comparison between the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-based C-LUT. The proposed C-LUT offers an attractive alternative for implementing combinational logic as well as sequential logic versus previous spin-based LUT designs in the literature. Foremost, C-LUT eliminates the sense amplifier typically employed by using a differential polarity dual MTJ design, as opposed to a static reference resistance MTJ. This realizes a much wider read margin and the Monte Carlo simulation of the proposed fracturable C-LUT indicates no read and write errors in the presence of a variety of process variations scenarios involving MOS transistors as well as MTJs. Additionally, simulation results indicate that the proposed C-LUT reduces the standby power dissipation by -fold compared to the SRAM-based LUT. Furthermore, the proposed SHE-MTJ-based C-LUT reduces the area by -fold and -fold compared to the SRAM-based LUT and the STT-MTJ-based C-LUT, respectively.
Reconfigurable Logic, Fracturable LUT, Magnetic Tunnel Junction, Spin-based Memory Cell, Spin Hall Effect, Spin Transfer Torque.
††copyright: none††journalyear: 2019††copyright: acmcopyright††conference: Great Lakes Symposium on VLSI 2019; May 9–11, 2019; Tysons Corner, VA, USA††booktitle: Great Lakes Symposium on VLSI 2019 (GLSVLSI ’19), May 9–11, 2019, Tysons Corner, VA, USA††price: 15.00††doi: 10.1145/3299874.3318038††isbn: 978-1-4503-6252-8/19/05††ccs: Hardware Spintronics and magnetic technologies††ccs: Hardware Emerging architectures††ccs: Hardware Asynchronous circuits††ccs: Hardware Combinational circuits††ccs: Hardware Programmable logic elements††ccs: Hardware Process, voltage and temperature variations
1. Introduction
Flexibility and runtime adaptability are two of the main motivations for the wide adoption of reconfigurable fabrics. Among the most commonly used reconfigurable fabrics, Field Programmable Gate Arrays (FPGA) have been the primary focus due to their flexibility that allows realization of logic elements at medium and fine granularities while incurring low non-recurring engineering costs and rapid deployment to market. Additionally, FPGAs have been researched as promising platform that can be utilized effectively to increase reliability in case of process-voltage-temperature variation (Al-Haddad et al., 2015). The main challenge of static random access memory (SRAM)-based FPGAs is their increased area and power consumption to achieve flexible design. The main components of FPGAs are Look-Up Tables (LUTs) and switch boxes that are mainly consisted of SRAM cells (Kuon et al., 2008). However, SRAM-based LUTs incur limitations such as high static power, volatility, and low logic density.
Innovations using emerging devices within FPGAs have been sought to bridge the gaps needed to overcome the limitations of SRAM-based FPGAs. High-endurance non-volatile spin-based LUTs have been studied in the literature as promising alternatives to SRAM-based LUTs, Flash-based LUTs, and other state-of-the-art emerging LUTs such as resistive random access memory (RRAM)-based LUTs and phase change memory (PCM)-based LUTs (Zand and DeMara, 2017; Tang et al., 2016; Huang et al., 2014; Attaran et al., 2018; Suzuki and Hanyu, 2019; Suzuki et al., 2013). Spin-based devices offer non-volatility, near-zero static power, high endurance, and high integration density (Salehi et al., 2017; Yoda et al., 2017). The spin-based LUTs presented in the literature (Zand and DeMara, 2017; Tang et al., 2016; Huang et al., 2014; Attaran et al., 2018; Suzuki and Hanyu, 2019; Suzuki et al., 2013) require separate read and write operations as well as a clock, which makes these LUTs a suitable candidate for sequential logic operations. However, the main challenge that has not been addressed in the literature is providing a spin-based LUT design for combinational logic operation without the need for a clock. Additionally, proposed spin-based LUTs proposed in the literature fail to maintain a wide sense margin and high reliability without incurring significant area and power dissipation overheads (Zand and DeMara, 2017; Tang et al., 2016; Huang et al., 2014; Attaran et al., 2018; Suzuki and Hanyu, 2019; Suzuki et al., 2013). In this paper, in order to address the aforementioned challenges, we develop a clockless -input fracturable non-volatile Combinational LUT (C-LUT) with wide read margin using spin Hall effect (SHE)-based Magnetic Tunnel Junction (MTJ) and provide a detailed comparison between the SHE-MRAM and Spin Transfer Torque (STT)-MRAM C-LUTs. Additionally, we provide detailed analysis on the reliability of our proposed C-LUT in the presence of Process Variation (PV).
2. Realizing Fracturable 6-Input Clockless LUT
The primary goal of using LUTs in the reconfigurable fabrics is for implementing combinational logic. Generally, -input Boolean functions are implemented using LUTs that are considered a memory that has memory cells. The inputs are assigned using a select tree which is constructed with Pass Transistors and Transmission Gates (TGs) (Zand et al., 2016). Most contemporary FPGAs, utilize fracturable -input LUTs in their design in order to be able to implement one -input boolean function or two -input boolean functions (Percey, 2007). Fig. 1 depicts our proposed -input fracturable SHE-MRAM C-LUT and Fig. 1 illustrates the -input fracturable STT-MRAM C-LUT. In Fig. 1 and Fig. 1, where red color indicates the write path and black color indicates the read path. When the WWL and signals are asserted, the Write TGs of each memory cell, TGW1 and TGW2, will turn on and using Bit Lines, , and Source Lines, , we write into both MTJs in each memory cell, and , so that they hold complementary values. If is in the state then will be in the state and vice versa. This will result in a wide read margin during the read operation.
After the termination of the write operation, in order to read the data stored in the MTJs, RWL and signals will be enabled, which results in activation of Read TGs of each memory cell, TGR. During the read operation, PR and NR transistors are turned on when RWL and are asserted, which provides the read path from VDD to GND. The source of PR, which is a PMOS transistor, is connected to VDD to provide strong one and the source of NR, which is an NMOS transistor, is connected to GND to provide strong zero. A voltage divider circuit is designed as a result of resistance difference between the and , and the divided voltage can be observed at the nodes shown in Fig. 1 and Fig. 1. According to the select tree input signals, shown as A, B, C, D, E, and F in Fig. 1, using two inverters, the voltage on nodes will be amplified to generate the required output. Since the values stored in the and devices are complementary, using one MTJ device to retain the data value and the other as the reference value will result in a wide read margin from to (Salehi and DeMara, 2018), which we leverage herein to increase the reliability of the read operation.
In the proposed C-LUT design there is no need for an external clock or a large sense amplifier circuit. Furthermore, the proposed fracturable C-LUT can perform as a single -input LUT or two -input LUTs. The Operation mode of the proposed LUT is controlled using S5 and S6 signals. If S5 signal is enabled and S6 is disabled, then the C-LUT will be operating as two -input LUTs and the outputs of the C-LUT will be OUT0 and OUT2. On the other hand, if S5 signal is disabled and S6 signal is enabled, then the C-LUT will be operating as a -input LUT and OUT1 will be the C-LUT’s output. The proposed fracturable C-LUT provides significantly higher functional flexibility at the expense of slightly more power consumption as studied in Section 3.
3. Simulation Framework, Results, and Analysis
Herein, we use the HSPICE circuit simulator to validate the functionality of proposed C-LUT using nm CMOS technology and the STT-MRAM model developed by Kim et al. in (Kim et al., 2015). Figure 2 and 2 show the transient response of the C-LUT implementing a -input OR operation for and input signals, respectively. In order to generate the current required for a write delay of less than ns, the write transistors are required to be enlarged -fold. As shown, the HSPICE simulations verify the correct functionality of our proposed C-LUT.
Table 1 lists comparison results between the SRAM-LUT and proposed C-LUT in terms of power consumption and delay. The results show more than standby power reduction at the cost of increased write power which can be tolerated due to its infrequent occurrence of write operations in LUTs. There are three energy profiles in the FPGA LUT circuits: Read energy consumption during the FPGA normal operation, Standby energy for the LUTs that are not on the active datapath, which can constitute a significant portion of the FPGA fabric, and write energy that is consumed during the LUTs’ configuration operation which occurs rarely. Table 2 provides an area and energy consumption comparison between SRAM-LUT and C-LUT. As listed, the structure of a -input MRAM-based C-LUT requires MOS transistors plus MTJs, which can be fabricated on top of the CMOS transistors incurring low area overhead, while the conventional -input SRAM-LUT includes MOS transistors. This results in an area overhead of roughly for C-LUT compared to SRAM-LUT, which is primarily induced by the write circuits. Thus, innovations are sought to reduce the area and energy consumption of the MRAM cell’s write circuit to mitigate these issues. Recently, SHE-MRAM cells have attracted considerable attentions as an alternative for the conventional STT-MRAMs. Herein, we have used the SHE-MRAM device model proposed by Camsari et al. (Camsari et al., 2015) to realize a circuit-level simulation of our SHE-MRAM C-LUT. The results obtained exhibit that a TG-based write circuit with minimum-sized MOS transistors can produce the sufficient write current amplitude required for switching the SHE-MRAM’s state in less than ns. Thus, table 3 provides an iso-delay comparison between STT-MRAM and SHE-MRAM C-LUT in terms of device count and write energy. As listed, the SHE-MRAM C-LUT can achieve more than area reduction, while realizing comparable write energy consumption. Moreover, the SHE-MRAM C-LUT achieves at least device count reduction compared to SRAM-LUT.
Furthermore, to analyze the reliability of the read and write operations of the proposed C-LUT, Monte Carlo (MC) simulation is performed to cover a wide range of PV scenarios that may occur in the fabricated device. The MC simulation is performed with instances considering the effects of PV on CMOS peripheral circuit and the MTJs. In particular, variation of for the MTJs’ dimensions along with variation on the threshold voltage and variation on transistors dimentions are assessed. Fig. 3(a) depicts the distribution of the switching times for and , Fig. 3(b) illustrates the distribution of MTJ resistances in and states, and Fig. 3(c) shows the distribution of read, , and write, currents for the MC instances. According to the MC simulation results, C-LUT provides reliable write performance resulting in less than write errors in error-free MC instances. In particular, results of the MC simulation show that the switching time for is ns on average and the switching time for is ns on average, which both fall under the ns duration of the write operation, as depicted in Fig. 3(a). Additionally, since the states of the MTJs are differential, they provide a wide read margin and as a result there are less than read errors caused by PV based on the error-free MC simulation results. Furthermore, our proposed C-LUT does not suffer from read disturbance due to the small read current compared to the write current as shown in Fig. 3(c). According to our MC simulation results, the read current is A on average, which is significantly lower than the write current that is A on average.
4. Conclusion
To overcome the conventional SRAM-LUT limitations such as high static power, volatility, and low logic density, we have proposed a novel LUT design using spin-based devices. The proposed C-LUT is a clockless design and a suitable candidate for combinational logic, which can also be combined with a flip-flop circuit to implement sequential logic. According to our simulation results, the standby power dissipation of the proposed C-LUT is W, which is reduced by -fold compared to the SRAM-based LUT. Moreover, the structure of the proposed SHE-MRAM based C-LUT includes and fewer transistors compared to the SRAM-based LUT and the STT-MRAM based C-LUT, respectively. Additionally, according to the process variation reliability analysis, the C-LUT circuit exhibits error rate for read and write operations in presence of variations spanning both transistors and MTJs.
Acknowledgement
This work was supported in part by the National Science Foundation (NSF) through ECCS-.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Al-Haddad et al . (2015) Rawad Al-Haddad, Rashad S. Oreifej, Ramtin Zand, Abdel Ejnioui, and Ronald F. De Mara. 2015. Adaptive Mitigation of Radiation-Induced Errors and TDDB in Reconfigurable Logic Fabrics. In 2015 IEEE 24th North Atlantic Test Workshop . IEEE, 23–32. https://doi.org/10.1109/NATW.2015.14 · doi ↗
- 3Attaran et al . (2018) Aliyar Attaran, Tyler David Sheaves, Praveen Kumar Mugula, and Hamid Mahmoodi. 2018. Static Design of Spin Transfer Torques Magnetic Look Up Tables for ASIC Designs. In Proceedings of the 2018 on Great Lakes Symposium on VLSI - GLSVLSI ’18 . ACM Press, New York, New York, USA, 507–510. https://doi.org/10.1145/3194554.3194651 · doi ↗
- 4Camsari et al . (2015) Kerem Yunus Camsari, Samiran Ganguly, and Supriyo Datta. 2015. Modular approach to spintronics. Scientific reports 5, 1 (9 2015), 10571. https://doi.org/10.1038/srep 10571 · doi ↗
- 5Huang et al . (2014) Kejie Huang, Yajun Ha, Rong Zhao, Akash Kumar, and Yong Lian. 2014. A Low Active Leakage and High Reliability Phase Change Memory (PCM) Based Non-Volatile FPGA Storage Element. IEEE Transactions on Circuits and Systems I: Regular Papers 61, 9 (9 2014), 2605–2613. https://doi.org/10.1109/TCSI.2014.2312499 · doi ↗
- 6Kim et al . (2015) Jongyeon Kim, An Chen, Behtash Behin-Aein, Saurabh Kumar, Jian-Ping Wang, and Chris H. Kim. 2015. A technology-agnostic MTJ SPICE model with user-defined dimensions for STT-MRAM scalability studies. In 2015 IEEE Custom Integrated Circuits Conference (CICC) . IEEE, 1–4. https://doi.org/10.1109/CICC.2015.7338407 · doi ↗
- 7Kuon et al . (2008) Ian Kuon, Russell Tessier, and Jonathan Rose. 2008. Fpga architecture: Survey and challenges. Foundations and Trends in Electronic Design Automation 2, 2 (2008), 135–253. https://doi.org/10.1561/1000000005 · doi ↗
- 8Percey (2007) Andrew Percey. 2007. Advantages of the Virtex-5 FPGA 6-Input LUT Architecture. (2007). www.BDTIC.com/XILINX
