# A 40-nm 8T SRAM with Selective Source Line Control of Read Bitlines and Address Preset Structure

S. Yoshimoto<sup>1</sup>, S. Miyano<sup>2</sup>, M. Takamiya<sup>3</sup>, H. Shinohara<sup>2</sup>, H. Kawaguchi<sup>1</sup>, and M. Yoshimoto<sup>1</sup> <sup>1</sup>Kobe University, <sup>2</sup>Semiconductor Technology Academic Research Center, <sup>3</sup>University of Tokyo Email: yoshipy@cs28.cs.kobe-u.ac.jp

Abstract- This paper presents a 40-nm 8T SRAM in which bitlines are partially discharged by a selective source line control (SSLC) for low-power operation. The proposed SSLC scheme reduces a read bitline voltage swing in an unselected column with a floating source line (SL) of dedicated read ports. The SL is controlled by an additional NMOS switch that is turned on in a selected column, but the switch is kept off in the remaining unselected columns. The proposed scheme is effective for power reduction in successive address readouts through a single column. Furthermore, this paper introduces an address preset structure. The preset address enables the SRAM to be read out with no access time penalty for preferred use of the SSLC scheme. We fabricated a 16-Kb 8T SRAM test chip in a 40-nm CMOS process and observed that the proposed SSLC scheme with the address preset structure saves 38.1% of the readout power on average.

## I. INTRODUCTION

The scaling of CMOS processes has continually increased chip density and has enhanced SoC functionality. The ITRS predicts that the total memory size of SoCs will increase by a factor of ten until 2022, with the memory consuming up to 65% of operating power in a mobile processor [1]. Near a threshold voltage  $(V_t)$ , an operating circuit is expected to be a good candidate to decrease the total power consumption [2]. which would expand its battery charging cycle. A logic circuit can operate at a lower supply voltage near the threshold voltage so that energy optimization of the logic circuits is expected to be conducted for an extremely low-power SoC [3]. However, process scaling increases threshold voltage variation and degrades the operating margin [4]. The scaled SRAM cannot lower the operating voltage further because of the process variation. For this reason, near-threshold computing (NTC) is realized with a cluster of higher-voltage caches and lower-voltage processing units in an optimized multi-core processor [5].

Recent works have specifically examined reduction of bitline swing in the SRAM, not lowering of the operating voltage [6–8]. One report [7] describes that the transistor variation increases the dynamic energy because faster cells fully discharge bitlines by a sense-amplifier-enable timing. The dynamic energy increased by the transistor variation is estimated as 82% at a supply voltage of 0.5 V. Charge collector circuits [6] leverage a charge on unselected local bitlines to drive a global bitline with charge sharing. Bitline amplitude limiting schemes [7, 8] reduce the unnecessary bitline swing of a faster cell. The limiter does not degrade its cell current, but stops discharging on the bitline when the bitline level decreases to a threshold voltage of an NMOS. As

earlier works have explained, it is important to decrease the bitline swing to achieve low-power SRAM.

Instead of the conventional 6T SRAM, a single-ended 8T SRAM is widely used even as a single-port memory leveraging disturb-free dedicated read ports [9]. The 8T SRAM presents advantages in designing write and read circuits separately, for which a half-select-free write-back scheme [10, 11] is proposed. Another advantage of the 8T SRAM is that a "1" readout consumes no dynamic power because a read bitline maintains a precharging voltage [12].

An 8T sub-Vt SRAM [13] employs a footer line (= source line: SL) shared in the same row to achieve low-power operation. This SRAM can eliminate a leakage path through unselected rows. However, read bitlines are still discharged in unselected columns, which degrades power efficiency.

As described in this paper, a partially discharging 8T SRAM with a selective source line control (SSLC) scheme is proposed. The proposed scheme cuts off SLs of the dedicated read ports selectively according to a column address. The proposed SSLC improves energy efficiency in a successive read operation of an instruction cache or video processing. In the incremental address access, only a row address (= less significant address bits) is frequently changed, as presented in Fig. 1, where a column address is changed only slightly. Our proposed work improves the energy efficiency of successive read operations.



Fig. 1. Successive memory access in video processing.

#### II. PROPOSED 8T SRAM

## A. Selective Source Line Control (SSLC) Scheme

Figure 2 illustrates an array of the 8T cells with a commonly used interleaving structure. The SLs of the dedicated read port are always grounded in the conventional structure. Although a selected local read bitline (RBL) is merely connected to a global RBL by a multiplexer, the other

local RBLs in the unselected columns are discharged, which is not necessary for the read operation.

Figure 3 illustrates the concept of the proposed SSLC scheme. The SL is a shared virtual ground line of the dedicated read ports in a single column of the 8T SRAM array. An NMOS switch and an OR gate are inserted in every column. The switch is turned on selectively or is kept off according to a column address. In a standby mode, the SLs are grounded to prepare upcoming random access; the SL might, however, be floated if one-clock wakeup is not needed. In the write operation, the SLs are grounded because the 8T SRAM employs a disturb mitigation scheme with write back to eliminate a half-select problem [14]. For that reason, the OR gate has a write enable (WE) input. Although the SSLC circuit must be implemented in every column, the area overhead is 0.7% in our design. Figures 4(a)-4(c) respectively show schematic, FEOL, and BEOL layouts of the proposed 8T cell with an SL. In the conventional 8T cell, a ground line of the dedicated read port can be shared with an adjacent cell. However, in the proposed 8T cell, the SL must be separated. In contrast, no area overhead exists in adding the SL. In our design, the cell size is 1.01  $\mu$ m<sup>2</sup> in a logic rule, which is slightly larger than the conventional one because the transistor length is relaxed for low-leakage operation.

Figure 5 shows operating waveforms of wordline, RBL, and SL of a selected and unselected columns, in which cells have all "0" data. In the selected columns, the bitlines are pulled down and discharged. They are then precharged for



Fig. 2. Conventional 8T memory cell array. Bit "0" discharges a read bitline (RBL).



Fig. 3. Conceptual diagrams showing the proposed partially discharging 8T SRAM with the selective source line control (SSLC) scheme in read operation.

subsequent operations. In unselected columns, the bitline is not fully discharged. Its swing is suppressed because the SL is floated by the SSLC.



Fig. 4. (a) Schematic, (b) FEOL, and (c) BEOL layouts of the proposed 8T cell with a separated source line (SL).



Fig. 5. Waveforms of wordline, read bitline (RBL), and source line (SL) of selected and unselected columns in consecutive "0" read operations.

## B. Address Preset Structure

Figure 6 shows an important shortcoming of the SSLC scheme: the access time penalty. Before read operations, the SL must be grounded in the selected column. This SL activation demands extra access time. In this subsection, an address preset structure is presented to eliminate the access time penalty caused by SL activation. The proposed structure leverages an access address (= an address accessed at the present cycle: ADD<sub>acc</sub>) and a preset address (= an address accessed at the next cycle: ADD<sub>pre</sub>) as shown in Figs. 7(a) and 7(b). In particular in a video memory or a memory shared by many cores, an address accessed in the next cycle can be preset because the memory access is algorithmic or is stored in a queue. In such a case, the ADD<sub>pre</sub> can be fed in a negative edge of the clock and the SL in the column accessed at the next cycle can be grounded preliminarily to prepare for the next positive edge. The address preset structure eliminates the access time penalty in the SSLC mode when the address accessed at the next cycle can be preset. The area overhead of the address preset structure is less than 1% in the SRAM macro.

In Fig. 7(a), the  $ADD_{acc}$ , which is the present address, receives an  $ADD_n$  on the first positive edge. It can then preset an  $ADD_{n+1}$  for the next cycle because it is fixed on the

negative edge of the clock. The SL is always grounded before access in the successive read operation. Consequently, the SSLC with the address preset structure improves the energy efficiency with no access time penalty.



Fig. 7. (a) Waveforms and (b) timing behavior of the SSLC scheme with the address preset structure.

#### III. CHIP IMPLEMENTATION AND MEASUREMENT RESULTS

We implemented a 16-Kb 8T SRAM test chip using a 40nm CMOS process as presented in Fig. 8. The macro size is  $128 \times 280 \ \mu\text{m}^2$ . The 8T SRAM consists of 16 bits / word  $\times 1$ K words (128 rows  $\times 128$  columns). An SRAM sub-array (128 rows  $\times 8$  columns) has a multiplexer that selects a column for an input or output datum. Figure 9 presents a schematic of the proposed 8T SRAM with the SSLC and the low-energy disturbance mitigation scheme [14]. A pair of write bitlines (WBL/WBLN) and an SL are shared by 128 cells in a column. A local RBL is shared by 16 cells and a NAND gate transports the readout datum to a global RBL driver. In the write operation, the write-back driver drives the WBL pair as to the original readout data to prevent the halfselect issue. In a write cycle, all SLs are grounded for the write-back operation, as described in the previous section.

Figures 10 and 11 respectively present access patterns in the measurements and measurement results in the successive read operation. The gray and the black bars in Fig. 11 respectively show active and leakage energies per cycle. In the measurements, four data and access patterns are used.

• In the all-zero (ALL0) data with a fixed address pattern, only selected RBLs are merely discharged because an access address is fixed. The other RBLs remain floating because they are always unselected. In this case, the proposed SSLC effectively reduces the read energy by 57.2%.

- The RBL remains "1" in the all-one (ALL1) data pattern. One fixed address is accessed continuously. In this case, the SSLC does not work because all the RBLs keep the precharged voltage. Therefore, the power reduction is 0.0%.
- The checkerboard pattern using incremental row address (CKB X+) has 50% "0" data. The measurement result demonstrates the SSLC decreases the read energy by 45.0% in this case.
- In the CKB using incremental column address (CKB Y+), the column address is changed at every cycle. The SLs cannot be floated for a long time; the power reduction is less effective than ALL0 and CKB X+ patterns. The reduction is 28.5% in the pattern.

On average of the four patterns, the proposed SSLC reduces the energy consumption by 38.1% in the successive read operation. Table 1 presents the test chip characteristics.



Fig. 8. 16-Kb 8T SRAM test chip.



Fig. 9. Schematic of the proposed 8T SRAM with the SSLC and the disturbance mitigation scheme [14].



Fig. 10. Access patterns in the energy measurement. The proposed SSLC is effective in the all-zero (ALL0) and the checkerboard X address increment (CKB X+) patterns than the other two patterns.



Fig. 11. Measurement results of the implemented test chip in read operation.

| Table I Teatures of a test em | Table 1 | Features | of a test | chip |
|-------------------------------|---------|----------|-----------|------|
|-------------------------------|---------|----------|-----------|------|

| Technology                | 40 nm bulk CMOS                   |  |  |  |
|---------------------------|-----------------------------------|--|--|--|
| Macro size                | 125 μm × 280 μm                   |  |  |  |
| Macro configuration       | 16 Kb (16 bits/word, 1 K words)   |  |  |  |
| Cell size                 | 1.01 μm <sup>2</sup> (logic rule) |  |  |  |
| # of cells / BL, SL       | 16 (RBL), 128 (WBL), 128 (SL)     |  |  |  |
| Density                   | 457 Kb/mm <sup>2</sup>            |  |  |  |
| Write active energy (CKB) | 2.18 pJ @ 0.5 V, 10 MHz, RT       |  |  |  |
| Read active energy (CKB)  | 1.14 pJ @ 0.5 V, 10 MHz, RT       |  |  |  |
| Leakage energy (CKB)      | 0.12 pJ @ 0.5 V, 10 MHz, RT       |  |  |  |

# **IV. CONCLUSION**

As described in this paper, we presented the selective source line control (SSLC) scheme for an 8T SRAM. The RBL swing is suppressed in an unselected column because the SSLC disconnects the source line (SL) of the dedicated read ports and therefore does not fully discharge the unselected read bitlines (RBL). In addition to the SSLC, the paper introduced the address preset structure to address the access time penalty, which best matches with the SSLC. The 16-Kb 8T SRAM test chip implemented in a 40-nm bulk CMOS technology demonstrates that the SSLC with the address preset structure reduces read energy consumption by 57.2%, 0.0%, 45.0%, and 28.5% in ALL0, ALL1, and CKB0 row address increments, and the CKB0 column address increment, respectively. On average, the proposed scheme exhibits a 38.1% energy reduction in successive addresses accessed, compared with a conventional 8T SRAM.

#### ACKNOWLEDGMENTS

This work was conducted as a part of the Extremely Low Power (ELP) project supported by METI and NEDO. The authors would like to thank Mr. Y. Yamamoto, Mr. Y. Okuma and Mr. K. Hirairi with STARC, Prof. T. Sakurai and Prof. T. Hiramoto with The University of Tokyo, Prof. K. Takeuchi and Dr. K. Miyaji with Chuo University.

#### REFERENCES

- [1] ITRS Report 2009, http://www.itrs.net/
- [2] Y. Pu, X. Zhang, J. Huang, A. Muramatsu, M. Nomura, K. Hirairi, H. Takata, T. Sakurabayashi, S. Miyano, M. Takamiya, and T. Sakurai, "Misleading Energy and Performance Claims in Sub/Near Threshold Digital Systems," *IEEE International Conference on Computer-Aided Design*, pp. 625-631, 2010.
- [3] B. Zhai, R. G. Dreslinski, D. Blaauw, T. Mudge, and D. Sylvester, "Energy Efficient Near-threshold Chip Multi-processing," *IEEE International Symposium on Low Power Electronics and Design*, pp. 32-37, 2007.
- [4] R. Heald and P. Wang, "Variability in Sub-100 nm SRAM Designs," *IEEE International Conference on Computer-Aided Design*, pp. 347-352, 2004.
- [5] D. Fick, R. G. Dreslinski, B. Giridhar, G. Kim, S. Seo, M. Fojtik, S. Satpathy, Y. Lee, D. Kim, N. Liu, M. Wieckowski, G. Chen, T. Mudge, D. Sylvester, and D. Blaauw, "Centip3De: A 3930DMIPS/W Configurable Near-Threshold 3D Stacked System with 64 ARM Cortex-M3 Cores," *IEEE International Solid-State Circuits Conference*, pp. 190-191, 2012.
- [6] S. Moriwaki, Y. Yamamoto, A. Kawasumi, T. Suzuki, S. Miyano, T. Sakurai, and H. Shinohara, "A 13.8pJ/Access/Mbit SRAM with Charge Collector Circuits for Effective Use of Non-Selected Bit Line Charges," *IEEE Symposium on VLSI Circuits Digest of Technical Papers*, pp. 60-61, 2012.
- [7] A. Kawasumi, T. Suzuki, S. Moriwaki, and S. Miyano, "Energy Efficiency Degradation Caused by Random Variation in Low-Voltage SRAM and 26% Energy Reduction by Bitline Amplitude Limiting (BAL) Scheme," *IEEE Asian Solid-State Circuits Conference*, pp. 165-168, 2011.
- [8] S. Yoshimoto, M. Terada, Y. Umeki, S. Okumura, A. Kawasumi, T. Suzuki, S. Moriwaki, S. Miyano, H. Kawaguchi, and M. Yoshimoto, "A 40-nm 256-Kb Sub-10 pJ/Access 8T SRAM with Read Bitline Amplitude Limiting (RBAL) Scheme," *IEEE International Symposium on Low Power Electronics and Design*, pp. 85-90, 2012.
- [9] R. Houle, K. Batson, D. Rodko, P. Patel, W. Huott, R. Franch, Y. Chan, D. Plass, S. Wilson, and P. Wang, "6.6+ GHz Low Vmin, read and half select disturb-free 1.2 Mb SRAM," *IEEE Symposium on VLSI Circuits Digest of Technical Papers*, pp. 14-16, 2007.
- [10] Y. Morita, H. Fujiwara, H. Noguchi, Y. Iguchi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment," *IEEE Symposium on VLSI Circuits Digest of Technical Papers*, pp. 256–257, 2007.
- [11] J. Wu, Y. Chen, M. Chang, P. Chou, C. Chen, H. Liao, M. Chen, Y. Chu, W. Wu, and H. Yamauchi, "A Large σV<sub>TH</sub>/VDD Tolerant Zigzag 8T SRAM with Area-Efficient Decoupled Differential Sensing and Fast Write-Back Scheme," *IEEE Symposium on VLSI Circuits Digest of Technical Papers*, pp. 103-104, 2010.
- [12] N. Verma and A. P. Chandrakasan, "A 65 nm 8T Sub-Vt SRAM Employing Sense-Amplifier Redundancy," *IEEE International Solid-State Circuits Conference*, pp. 328-329, 2007.
  [13] H. Fujiwara, K. Nii, J. Miyakoshi, Y. Murachi, Y. Morita, H.
- [13] H. Fujiwara, K. Nii, J. Miyakoshi, Y. Murachi, Y. Morita, H. Kawaguchi, and M. Yoshimoto, "A Two-Port SRAM for Real-Time Video Processor Saving 53% of Bitline Power with Majority Logic and Data-Bit Reordering," *IEEE International Symposium on Low Power Electronics and Design*, pp.61-66, 2006.
- [14] S. Yoshimoto, M. Terada, S. Okumura, T. Suzuki, S. Miyano, H. Kawaguchi, and M. Yoshimoto, "A 40-nm 0.5-V 20.1-uW/MHz 8T SRAM with Low-Energy Disturb Mitigation Scheme," *IEEE Symposium on VLSI Circuits Digest of Technical Papers*, pp. 72-73, 2011.