# A 28-nm 484-fJ/writecycle 650-fJ/readcycle 8T Three-Port FD-SOI SRAM for Image Processor\*

# Haruki MORI<sup>†a)</sup>, Yohei UMEKI<sup>†</sup>, *Student Members*, Shusuke YOSHIMOTO<sup>†</sup>, Shintaro IZUMI<sup>†</sup>, Koji NII<sup>††,†††</sup>, Hiroshi KAWAGUCHI<sup>†</sup>, *and* Masahiko YOSHIMOTO<sup>†</sup>, *Members*

**SUMMARY** This paper presents a low-power and low-voltage 64-kb 8T three-port image memory using 28-nm FD-SOI process technology. Our proposed SRAM accommodates eight-transistor bit cells comprising one-write/two-read ports and a majority logic circuit to save active energy. The test chip operates at a supply voltage of 0.46 V and access time of 140 ns. The minimum energy point is a supply voltage of 0.54 V and an access time of 55 ns (= 18.2 MHz), at which 484 fJ/cycle in a write operation and 650 fJ/cycle in a read operation are achieved assisted by majority logic. These factors are 69% and 47% smaller than those in a conventional 6T SRAM using the 28-nm FD-SOI process technology.

key words: image memory, multi-port SRAM, 8T, FD-SOI, 28-nm, majority logic

### 1. Introduction

Application of image recognition is being extended to various fields such as an automatic driving systems, robot vision, and augmented reality systems with improved image resolution. Image resolution enhancement leads to increased SRAM capacity, area, and power consumption because of the increase amount of image data. Power consumption in SRAM dissipates 43% of a whole image processor in a 65-nm CMOS process [1]. For wearable devices handling image information, energy-efficient SRAM will be expected, as presented in Fig. 1.

28-nm Fully Depleted SOI (FD-SOI) technology is promising to provide high speed with low-voltage SRAM [2]. The 28-nm FD-SOI has fully depleted transistors and an ultra-thin silicon body and BOX layer, giving them excellent electrostatic control. Therefore, it brings stable features with low voltage operation. A BOX layer reduces the leakage current to control the electrical flow from a source node to a drain node in a transistor. Moreover, the BOX layer reduces the parasitic capacitance between the source node and the drain node. This feature of 28-nm FD-

Manuscript received December 2, 2015.

<sup>†</sup>The authors are with the Graduate School of System Informatics, Kobe University, Kobe-shi, 657–8501 Japan.

<sup>††</sup>The author is with Renesas Electronics Corporation, Kodairashi, 187–8588 Japan.

<sup>†††</sup>The author is with the Graduate School of Natural Science & Technology, Kanazawa University, Kanazawa-shi, 920– 1192 Japan.

\*This paper is the modified version of the original paper presented at the IEEE Custom Integrated Circuit Conference (CICC) 2015 [18].

 a) E-mail: mori.haruki@cs28.cs.kobe-u.ac.jp DOI: 10.1587/transele.E99.C.901 SOI enables the production of ultra-low-power SRAMs [3]–[8].

Input data for image processing are stored temporarily in SRAM. In an image processor, many processing cores access the SRAM for multi-thread processing, as presented in Fig. 1. Demands for multi-port SRAM have been increased to accommodate high-speed and low-power image processing. The multi-port SRAM is suitable for parallel operation. It improves the total chip performance. To date, a multiport SRAM that supports simultaneous write and read operations is proposed for use as the image processor [9], [10]. The three-port SRAM is reportedly suitable for use as an image processor [11], [12]. When comparing features of two images, simultaneous read operations are requested to SRAM cells. Furthermore, realizing real-time processing requires a write operation for the next comparison at the same time as the read operation. Therefore, two read operations and one write operation must be performed simultaneously, which requires multiport SRAMs that have two-read/one-write access ports for the image processor.

The bitcell layout in the conventional three-port SRAM needs a larger area than an 8T dual-port SRAM due to the larger number of transistors [11]. In particular, an image processor requires a larger multiport memory capacity, which gives a serious impact on its cost. In this paper, we exhibit an 8T three-port SRAM smaller the conventional threeport one; its area is as small as the conventional 8T dual-port SRAM.

We designed a 28-nm FD-SOI 8T three-port SRAM for a low-power image processor and compared it to a 28nm FD-SOI 6T SRAM in the conventional form. Then we demonstrated high energy-efficiency of sub-pJ/cycle in the proposed SRAM. The remainder of this paper is organized as follows. Section 2 presents the proposed 8T three-port SRAM design and its operation. Measurement results are shown in Sect. 3. The final section summarizes the findings.



Fig.1 Memory system in image processing.

Copyright © 2016 The Institute of Electronics, Information and Communication Engineers

Manuscript revised March 28, 2016.

## 2. Proposed 8T Three-Port SRAM

# 2.1 8T Three-Port SRAM Cell Design

A circuit schematic of the proposed 8T three-port SRAM is presented in Fig. 2. It has a pair of write bitlines and two single-ended read bitlines (one-write/two-read bitcell structure). The proposed SRAM has two pull-up PMOSs (load-PMOS), two pull-down NMOSs (drive-NMOS), and four transfer NMOSs (access-NMOS). In this circuit, M7 and M8 transistors are the two single-ended read ports. Source nodes of M7 and M8 transistors are connected to node QB. The drain nodes are connected to read bitlines (RBL\_A, RBL\_B).

The gate nodes are connected to the read wordlines (RWL\_A, RWL\_B). This asymmetrical 8T three-port SRAM cell achieves high density. All transistor W/L sizes in the bitcell are shown in Table 1. The W/L size of the pull-down transistor in the bitcell is chosen to remain a sufficient SNM (static noise margin) even when the both read ports are activated.

Figure 3(a) presents FEOL of the proposed 8T threeport SRAM. Read ports comprising M7 and M8 transistors are arranged separately from a 6T SRAM cell, which share a common contact located at the middle as the QB node. This layout achieves a smaller cell area than in symmetrical layout in which the additional read ports are arranged at both ends [13].

Figure 3(b) shows the BEOL of proposed SRAM. The SRAM cell size is determined by the number of horizontal and vertical wires. In our proposed SRAM, two read ports consisting of M7 and M8 transistors are configured as two single-ended read ports having three bitlines and three wordlines. The cell area is  $0.56 \,\mu\text{m}^2$  on a logic rule base, which is as small as the dual-port 8T bitcell [14], although the number of ports is increased.



Fig. 2 Schematic of proposed 8T three-port SRAM.

 Table 1
 Transistor W/L sizes in the proposed SRAM cell.

| $\smallsetminus$ | Pull-up | Pull-down | Write pass gate | Read pass gate |
|------------------|---------|-----------|-----------------|----------------|
|                  | M1, M3  | M2, M4    | M5, M6          | M7, M8         |
| Width            | 80      | 142       | 80              | 80             |
| Length           | 30      | 30        | 30              | 30             |

The operating waveforms in the read operation is depicted in Fig. 4. No read current flows through the read bitlines (RBL\_A and RBL\_B) when the internal node, node QB, is "1". Maximizing the number of "1"s at node QB is important to reduce dynamic power in the read operation.

### 2.2 Precharge-Less Energy-Efficient Write Circuit

Figure 5 presents write schemes for the conventional 6T SRAM and the proposed 8T SRAM. Figure 5(a) depicts the conventional write circuit; it is necessary to precharge a bitline pair to maintain stability of read operations because both read and write operations use the common bitline pair. Figure 5(b) depicts the precharge-less write circuit. Successive writes of the same data consume less energy because the proposed 8T SRAM does not need a precharge scheme on the write bitlines because of the dedicated read ports for



Fig. 3 Bitcell layout of proposed SRAM: (a) FEOL and (b) BEOL.



Fig. 4 Waveforms of proposed 8T three-port SRAM in read operation.



**Fig.5** Schematics of write circuits: (a) conventional circuit and (b) precharge-less circuit.



**Fig.6** Waveforms in write operation: (a) write wordline (WWL), write bitlines (WBL and WBLB) in (b) a conventional write circuit and (c) a precharge-less write circuit.

the read operation. However, it incurs the well-known halfselect problem along the write wordline. The divided wordline structure is therefore adopted to avoid the half-select problem [15].

Figure 6 portrays simplified waveforms during write cycles. Figure 6(a) shows the waveform of the write wordline (WWL) commonly used in the conventional SRAM and the proposed SRAM. Figure 6(b) shows waveforms of the write bitlines (WBL and WLBB) in the conventional write scheme. The charge/discharge power is consumed in every cycle by the precharge to a supply voltage. Figure 6(c) portrays waveforms of the write bitlines in the proposed SRAM. By virtue of the precharge-less write scheme, which reduces the write energy, the charge/discharge power on WBL and WBLB is consumed only when a write datum is changed.

# 2.3 Static Noise Margin in Proposed 8T Three-Port SRAM

A multiport SRAM supports simultaneous accesses from plural cores through read and write ports. Particularly in a one-write two-read (1W2R) three-port SRAM cell, the two read ports are both available for simultaneous readouts, which implies that simultaneous readouts occur [16]. Figure 7 shows a variety of read situations in the 1W2R threeport SRAM cell when both read ports are enabled simulta-



Fig.7 Variety of access situations in the proposed 1W2R three-port SRAM.



**Fig.8** Simulated butterfly curves at several Vdd from 1.0 V down to 0.4 V: (a) single-port readout and (b) dual-port readout.

neously. Figure 7(a) depicts two SRAM cells on different row addresses and different column addresses, designated independently. No issues emerge relative to the access conflict. However, the simultaneous dual-port readouts to a single SRAM cell activates both RWL\_A and RBL\_B, as presented in Fig. 7(b), which might worsen the static noise margin (SNM) because of double read currents.

Figure 8 presents simulation results of the SNM in the proposed 1W2R 8T three-port SRAM cell at several supply voltages of Vdd = 0.4-1.0 V. Figure 8(a) depicts the standard butterfly curves in the single port read situation: the SNM of 171 mV are achieved at 1.0 V, leaving 85% of the SNM in the conventional 6T SRAM [2]. Figure 8(b) depicts the worst-case butterfly curves in the simultaneous dual-port reads. The SNM is reduced to 101 mV at 1.0 V. An interesting point is that the maximum SNM of 102 mV is observed at 0.8 V.

### 2.4 Combination with Majority Logic

Our earlier study demonstrated that the majority logic circuit can conserve charge/discharge power on the read bitlines [17]. Image data reflect luminance information: bright pixels have many "1" data; dark pixels have many "0" data. For read energy reduction, the dark pixel having many "0" s should be inverted by the majority logic. To maximize the number of "1" s, the majority-logic circuit counts "1" s and decides if input data should be inverted in a write cycle, so that "1" s are in the majority. The inversion information is stored in an additional flag bit. In a read cycle, the procedure is reversed. Output data are inverted if a flag bit is true, so that the original data can be read. The majority logic does not reduce write energy because the "1" write energy and the "0" write energy are the same. In our proposed SRAM, majority logic conserves charge/discharge power effectively on the read bitlines because the number of "1" s in input data is maximized.

## 3. Chip Implementation and Measurement Results

We fabricated a 64-kb 8T three-port SRAM macro using 28-nm FD-SOI process technology. Figure 9 shows a test chip micrograph. The proposed 64-kb macro consists of  $2 \times 32$ -kb sub-blocks. The macro area is 0.058 mm<sup>2</sup>. Figure 10 presents a measured read Shmoo plot of the proposed SRAM macro. We verified that it can operate with supply voltage of 0.46 V and access time of 140 ns. At room temperature (= 25 degree), the operating point that achieves the minimum energy per cycle is a supply voltage of 0.54 V and



Fig.9 Test chip photograph.



Fig. 10 Read Shmoo plot.

a cycle time of 55 ns (= 18.2 MHz). Figure 11 shows the Shmoo plot in write operations. The test chip can operate at write pulse width of 4 ns. Figure 12 portrays a schematic of the proposed 8T three-port SRAM array and its peripheral circuits. Figure 13 shows the measured leakage and active energies.

In the write operation, the test pattern of the "ALL0" write pattern means successive "0" writes to all bitcells in the memory macro. "ALL1" means successive "1" writes. In those cases, bitcell data do not change, and the bitline charge/discharge energy are saved. The "01-pat." write pattern signifies the alternately writing "0"s and "1"s to the bitcells. Then the charge/discharge power occurs on the WBLs. This is the worst case in the write operation. The worst-case write energy is 484 fJ/cycle, which is 69%



Fig. 12 Schematic of proposed 8T three-port SRAM array and its peripheral circuits.

905

smaller than that in the 6T SRAM (see Fig. 13).

The BL lengths of the proposed three-port SRAM are 1.3 times longer than the conventional 6T SRAM because of the three WLs (1WWL/2 RWLs) drawn through the 8T bitcell. However, the proposed 8T three-port SRAM does not require the WBL precharge scheme in the 6T SRAM. Furthermore, its WL are divided by every 16 rows. Therefore, the proposed SRAM can reduce needless energy in the half-selected bitcells; As a result, the write energy turns out lower than the conventional 6T SRAM.

It is noteworthy that the read circuit must have the RBL precharge scheme because of the single-ended read ports. In the read operation, the test patterns of the "ALL0" and "ALL 1" mean successive "0" and "1" read operations, respectively. The "01-pat." read pattern results in the average dynamic energy of "ALL0" and "ALL1". The respective "0" and "1" read energies are 1663.2 fJ/cycle (a read dynamic energy of 1449 fJ/cycle + a read leakage energy of 168.5 fJ/cycle) and 361.7 fJ/cycle (a read dynamic energy of 168.5 fJ/cycle + a read leakage energy of 168.5 fJ/cycle + a read leakage energy of 17%. The read energy saving in the "1" read operation is 77%. The read energy improvement is, however, merely 35%, on average with no majority logic.

Figure 14 portrays the impact of the majority logic on the read energy saving. In bright Image 1, the read energy was reduced by 23%, whereas, in the dark Image 6,



Fig. 13 Measured write energies, read energies, and comparisons.

it reaches a 47% saving. As one might expect, the dark image is more appropriate and effective for the majority logic. In this case, the read energy is 650 fJ/cycle. Table 2 presents test SRAM characteristics.

Figure 15 shows the estimated power consumption when the proposed 8T three-port SRAM with the majority logic is applied to our prior work, ME264 motion estimation processor [18]; the values are scaled by the process node, supply voltage and operating frequency (28-nm process node, 0.54-V supply voltage and 50-MHz operation frequency). The ME264 processor has SIMD systolic-array architecture, and a 10T three-port SRAM is used as a search window and a template block. The energy consumed on the proposed SRAM is saved by 290  $\mu$ W, which signifies 24% energy reduction in total over the conventional processor. Therefore, the proposed 8T three-port SRAM is suitable for

Table 2Test chip features.

|                                           | *                            |  |
|-------------------------------------------|------------------------------|--|
| Technology                                | 28-nm FD-SOI                 |  |
| Supply voltage                            | 0.46-0.7V (Memory macro)     |  |
| Supply voltage                            | 1.8V (I/O)                   |  |
| Chip area                                 | 1.0x1.0mm <sup>2</sup>       |  |
| Macro size                                | 242x242µm²                   |  |
| Macro configulation                       | 64Kb (32Kb X 2), 16bits/word |  |
| Cell size                                 | 0.384x1.457µm²               |  |
| Frequency                                 | 7.14MHz@0.46V, 50MHz@0.7V    |  |
| Write active energy                       | 298fJ@0.54V, 18.2MHz, RT     |  |
| Read active energy<br>with majority logic | 650fJ@0.54V, 18.2MHz, RT     |  |



**Fig. 15** Estimated power consumption of motion estimation image processor.



Fig. 14 Read energies saved by majority logic in actual image data.

the image processor.

# 4. Conclusion

As described in this paper, we presented an 8T three-port SRAM for an image processor. The proposed SRAM comprises one-write/two-read ports and a majority logic circuit to save active energy. We fabricated a 64-kb 8T three-port SRAM using 28-nm FD-SOI process technology. The test chip exhibits 0.46 V operation and access time of 140 ns. The energy minimum point is a supply voltage of 0.54 V at a frequency of 18.2 MHz, at which 484 fJ/cycle in a write operation and 650 fJ/cycle in a read operation are achieved, assisted by the majority logic. These factors are 69% and 47% smaller than those in a 28-nm FD-SOI 6T SRAM.

# Acknowledgments

We would like to thank STMicroelectronics for chip implementation. This work was supported by STARC, VLSI Design and Education Center (VDEC), The University of Tokyo with the collaboration with Cadence Corporation, Mentor Graphics Corporation, Synopsys Inc., and CMP Inc.

### References

- J. Miyakoshi, Y. Murauchi, K. Hamano, M. Miyama, and M. Yoshimoto, "A Low-Power Systolic Array Architecture for Block-Matching Motion Estimation," IEICE Trans. Electron, vol.E88-C, no.4, pp.559–569, April 2005.
- [2] N. Planes, O. Weber, V. Barral, S. Haendler, D. Noblet D. Croain, M. Bocat, P. Sassoulas, X. Federspiel, A. Cros, A. Bajolet, E. Richard, B. Dumont, P. Perreau, D. Petit, D. Golanski, C. Fenouillet-Beranger, N. Guillot, M. Rafik, V. Huard, S. Puget, X. Montagner, M.-A. Jaud, O. Rozeau, O. Saxod, F. Wacquant, F. Monsieur, D. Barge, L. Pinzelli, M. Mellier, F. Boeuf, F. Arnaud, and M. Haond, "28-nm FDSOI Technology Platform for High-Speed Low-Voltage Digital Applications," IEEE Symposium on VLSI Tech., pp.133– 134, June 2012.
- [3] P. Flatresse, B. Giraud, J. Noel, B. Pelloux-Prayer, F. Giner, D. Arora, F. Arnaud, N. Planes, J. Le Coz, O. Thomas, S. Engels, G. Cesana, R. Wilson, and P. Urard, "Ultra-Wide Body-Bias Range LDPC Decoder in 28-nm UTBB FDSOI Technology," ISSCC Dig. of Tech. Papers, pp.424–425, Feb. 2013.
- [4] C. Fenouillet-Beranger, S. Denorme, B. Icard, F. Boeuf, J. Coignus, O. Faynot, L. Brevard, C. Buj, C. Soonekindt, J. Todeschini, J.C. Le-Denmat, N. Loubet, C. Gallon, P. Perreau, S. Manakli, B. Minghetti, L. Pain, V. Arnal, A. Vandooren, D. Aime, L. Tosti, C. Savardi, M. Broekaart, P. Gouraud, F. Leverd, V. Dejonghe, P. Brun, M. Guillermet, M. Aminpur, S. Barnola, F. Rouppert, F. Martin, T. Salvetat, S. Lhostis, C. Laviron, N. Auriac, T. Kormann, G. Chabanne, S. Gaillard, O. Belmont, E. Laffosse, D. Barge, A. Zauner, A. Tarnowka, K. Romanjec, H. Brut, A. Lagha, S. Bonnetier, F. Joly, N. Mayet, A. Cathignol, D. Galpin, D. Pop, R. Delsol, R. Pantel, F. Pionnier, G. Thomas, D. Bensahel, S. Deleonibus, T. Skotnicki, and H. Mingam, "Fully-Depleted SOI Technology using High-K and Single-Metal Gate for 32 nm Node LSTP Applications featuring 0.179 m<sup>2</sup> 6T-SRAM bitcell," IEEE IEDM, pp.267–270, Dec. 2007.
- [5] O. Thomas, B. Zimmer, B. Pelloux-Prayer, N. Planes, K.-C. Akyel, L. Ciampolini, P. Flatresse, and B. Nikolić, "6T SRAM Design for Wide Voltage Range in 28-nm FDSOI," IEEE Int. SOI Conf., pp.1– 2, Oct. 2012.

- [6] H. Fujiwara, T. Takeuchi, Y. Otake, M. Yoshimoto, and H. Kawaguchi, "An Inter-Die Variability Compensation Scheme for 0.42-V 486-kb FD-SOI SRAM using Substrate Control," IEEE Int. SOI Conf., pp.93–94, Oct. 2008.
- [7] L. Hutin, C. Le Royer, F. Andrieu, O. Weber, M. Casse, J.-M. Hartmann, D. Cooper, A. Béché, L. Brevard, L. Brunet, J. Cluzel, P. Batude, M. Vinet, and O. Faynot, "Dual Strained Channel Co-Integration into CMOS, RO and SRAM cells on FDSOI down to 17 nm Gate Length," IEEE IEDM Tech. Dig., pp.11.1.1–11.1.4, Dec. 2010.
- [8] H. Pilo, C.A. Adams, I. Arsovski, R.M. Houle, S.M. Lamphier, M.M. Lee, F.M. Pavlik, S.N. Sambatur, A. Seferagic, R. Wu, and M.I. Youns, "A 64 Mb SRAM in 22 nm SOI Technology Featuring Fine-Granularity Power Gating and Low-Energy Power-Supply-Partition Techniques for 37% Leakage Reduction," ISSCC Dig. of Tech. Papers, pp.322–323, Feb. 2013.
- [9] S.W. Keckler, W.J. Dally, B. Khailany, M. Garland, and D. Glasco, "GPUs AND THE FUTURE OF PARALLEL COMPUTING," IEEE Micro, vol.31, no.5, pp.7–17, Sept. 2011.
- [10] B.-C.C. Lai, H.-K. Kuo, and J.-Y. Jou, "A Cache Hierarchy Aware Thread Mapping Methodology for GPGPUs," IEEE Trans. Comput., vol.64, no.4, pp.884–898, April 2015.
- [11] M. Miyama, J. Miyakoshi, Y. Kuroda, K. Imamura, H. Hashimoto, and M. Yoshimoto, "A Sub-mW MPEG-4 Motion Estimation Processor Core for Mobile Video Application," IEEE J. Solid-State Ciecuits, vol.39, no.9, pp.1562–1570, Sept. 2004.
- [12] Y. Murachi, J. Miyakoshi, M. Hamamoto, T. Iinuma, T. Ishihara, F. Yin, J. Lee, H. Kawaguchi, and M. Yoshimoto, "A Sub 100 mW H.264 MP@L4.1 Interger-Pel Motion Estimation Processor Core for MBAFF Encoding with Reconfigurable Ring-Connected Systolic Array and Segmentation-Free, Rectangle-Access Search-Window Buffer," IEICE Trans. Electron, vol.E91-C, no.4, pp.465–478, April 2008.
- [13] Y. Ishii, H. Fujiwara, S. Tanaka, Y. Tsukamoto, K. Nii, Y. Kihara, and K. Yanagisawa, "A 28-nm Dual-Port SRAM Macro With Screening Circuitry Against Write-Read Disturb Failure Issues," Computers, IEEE J. Solid-State Circuits, vol.46, no.11, pp.2535– 2544, Sept. 2011.
- [14] S. Yoshimoto, M. Terada, S. Okumura, T. Suzuki, S. Miyano, H. Kawaguchi, and M. Yoshimoto, "A 40-nm 0.5-V 20.1- W/MHz 8T SRAM with Low-Energy Disturb Mitigation Scheme," IEEE Symposium on VLSI Circuits, pp.72–73, June 2011.
- [15] H. Fujiwara, M. Yabuuchi, M. Morimoto, and K. Tanaka, "A 20 nm 0.6 V 2.1 µW/MHz 128-kb SRAM with no half select issue by interleave wordline and hierarchical bitline scheme," IEEE Symposium on VLSI Circuits, pp.118–119, June 2013.
- [16] D.-P. Wang, H.-J. Lin, C.-T. Chuang, and W. Hwang, "Low-Power Multiport SRAM With Cross-Point Write Word-Lines, Shared Write Bit-Lines, and Shared Write Row-Access Transistors," IEEE Trans. Circuits Syst. II Exp. Briefs, vol61, no.3, pp.188–192, March 2014.
- [17] H. Fujiwara, K. Nii, H. Noguchi, J. Miyakoshi, Y. Murachi, Y. Morita, H. Kawaguchi, and M. Yoshimoto, "Novel Video Memory Reduces 45% of Bitline Power Using Majority Logic and Data-Bit Reordering," IEEE Trans. VLSI Systems, vol.16, no.6, pp.620–627, June 2008.
- [18] H. Mori, T. Nakagawa, Y. Kitahara, Y. Kawamoto, K. Takagi, S. Yoshimoto, S. Izumi, K. Nii, H. Kawaguchi, and M. Yoshimoto, "A 298-fJ/writecycle 650-fJ/readcycle 8T Three-Port SRAM in 28-nm FD-SOI Process Technology for Image Processor," IEEE Custom Integrated Circuits Conference (CICC), Sept. 2015.



Haruki Mori received B.E. degrees in Computer Science and Systems Engineering from Kobe University, Kobe, Japan in 2014. He is currently in the master course at Kobe University. His current research examines low-power SRAM design and low-voltage MRAM design.



**Koji Nii** received the B.E. and M.E. degrees in electrical engineering from Tokushima University, Tokushima, Japan, in 1988 and 1990, respectively, and the Ph.D. degree in informatics and electronics engineering from Kobe University, Hyogo, Japan, in 2008. In 1990, he joined the ASIC Design Engineering Center, Mitsubishi Electric Corporation, Itami, Japan, where he has been working on designing 0.8um to 130 nm embedded SRAMs and CAMs for CMOS ASICs, and researching on SOI SRAM

development. In 2003, he was transferred to Renesas Technology Corporation, Itami, Japan, which is a joint company of Mitsubishi Electric Corp. and Hitachi Ltd. in the semiconductor field. He has been working on designing 45 nm to 90 nm embedded low-power and high-speed SRAM macros, and researching on the 45 nm SRAM assist circuits techniques to enhance the functional margin against variations. He transferred his work location to Kodaira, Tokyo from Itami, Hyogo on April 2009, where he has been working on designing and researching on 28 nm High-k/Metalgate and 16 nm FinFET SRAM macros. His current responsibility is Chief Professional. He currently works on the research and development of embedded SRAM/TCAM/ROM and low-power design techniques with power gating in advanced technology nodes (28 nm, 16 nm, 10 nm and beyond) in the 1st Solution Business Unit, Renesas Electronics Corporation, Kodaira, Tokyo, Japan. Dr. Nii holds 90 US patents, and published 32 IEEE/IEICE papers and 76 talks at major international conferences. He received the Best Paper Awards at IEEE International Conference on Microelectronic Test Structures (ICMTS) in 2007 and IEEE International Symposium on Quality Electronic Design (ISQED) in 2013. He also received the LSI IP Design Awards in 2007 and 2008, Japan. He is a Technical Program Committee of the IEEE CICC and IEEE IEDM and an Associated Editor of the IEEE Trans. on VLSI Systems. He is a senior member of the IEEE Solid-State Circuits Society and the IEEE Electron Devices Society. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE), Japan. He is also a Visiting Professor of Graduate School of Natural Science and Technology, Kanazawa University, Ishikawa, Japan.



**Hiroshi Kawaguchi** received B.Eng. and M.Eng. degrees in electronic engineering from Chiba University, Chiba, Japan, in 1991 and 1993, respectively, and earned a Ph.D. degree in electronic engineering from The University of Tokyo, Tokyo, Japan, in 2006. He joined Konami Corporation, Kobe, Japan, in 1993, where he developed arcade entertainment systems. He moved to The Institute of Industrial Science, The University of Tokyo, as a Technical Associate in 1996, and was appointed as a

Research Associate in 2003. In 2005, he moved to Kobe University, Kobe, Japan. Since 2007, he has been an Associate Professor with The Department of Information Science at that university. He is also a Collaborative Researcher with The Institute of Industrial Science, The University of Tokyo. His current research interests include low-voltage SRAM, RF circuits, and ubiquitous sensor networks. Dr. Kawaguchi was a recipient of the IEEE ISSCC 2004 Takuo Sugano Outstanding Paper Award and the IEEE Kansai Section 2006 Gold Award. He has served as a Design and Implementation of Signal Processing Systems (DISPS) Technical Committee Member for IEEE Signal Processing Society, as a Program Committee Member for IEEE Custom Integrated Circuits Conference (CICC) and IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips), and as an Associate Editor of IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences and IPSJ Transactions on System LSI Design Methodology (TSLDM). He is a member of the IEEE, ACM, IEICE, and IPSJ.



Yohei Umeki was born on December 20, 1985. He earned a B.E. degree in Computer and Systems Engineering from Kobe University, Hyogo, Japan, in 2012. He is currently on the doctor course at Kobe University. His current research is low-power SRAM and low-voltage MRAM designs.



**Shusuke Yoshimoto** received B.E. and M.E. degrees in Computer and Systems Engineering from Kobe University, Hyogo, Japan, in 2009 and 2011, respectively. He earned Ph.D. degree in Engineering from the university in 2013. He was a JSPS research fellow from 2013 to 2014. He worked in Department of Electrical Engineering at Stanford University as a postdoctoral from 2013 to 2015. Since 2015, he has been an Assistant Professor in The Institute of Scientific and Industrial Research at Osaka Uni-

versity. His current research interests include biomedical signal processing, flexible electronics, organic circuit design, nano-electronics, soft error, low-power and robust memory design. He was a recipient of 2011 and 2012 IEEE SSCS Japan Chapter Academic Research Awards, 2013 IEEE SSCS Kansai Chapter IMFEDK Student Paper Award, and 2013 Intel/Analog Devices/Catalyst Foundation/Cirrus Logic CICC Student Scholarship Award. He has served as a program committee student member in IEICE Integrated Circuit Design.



Shintaro Izumi received his B.Eng. and M.Eng. degrees in Computer Science and Systems Engineering from Kobe University, Hyogo, Japan, in 2007 and 2008, respectively. He received his Ph.D. degree in Engineering from Kobe University in 2011. He was a JSPS research fellow at Kobe University from 2009 to 2011. Since 2011, he has been an Assistant Professor in the Organization of Advanced Science and Technology at Kobe University. His current research interests include biomedical signal

processing, communication protocols, low-power VLSI design, and sensor networks. He has served as a Vice Chair of IEEE Kansai Section Young Professional Affinity Group, as a Student Activity Committee Member for IEEE Kansai Section, as a Program Committee Member for IEEE Symposium on Low-Power and High-Speed Chips (COOL Chips), and as a Guest Associate Editor of IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences. He was a recipient of 2010 IEEE SSCS Japan Chapter Young Researchers Award.



Masahiko Yoshimoto joined the LSI Laboratory, Mitsubishi Electric Corporation, Itami, Japan, in 1977. From 1978 to1983 he had been engaged in the design of NMOS and CMOS static RAM. Since 1984 he had been involved in the research and development of multimedia ULSI systems. He earned a Ph.D. degree in Electrical Engineering from Nagoya University, Nagoya, Japan in 1998. Since 2000, he had been a professor of Dept. of Electrical & Electronic System Engineering in Kanazawa Univer-

sity, Japan. Since 2004, he has been a professor of Dept. of Computer and Systems Engineering in Kobe University, Japan. His current activity is focused on the research and development of an ultra low power multimedia and ubiquitous media VLSI systems and a dependable SRAM circuit. He holds on 70 registered patents. He has served on the program committee of the IEEE International Solid State Circuit Conference from 1991 to 1993. Also he served as Guest Editor for special issues on Low-Power System LSI, IP and Related Technologies of IEICE Transactions in 2004. He was a chair of IEEE SSCS (Solid State Circuits Society) Kansai Chapter from 2009 to 2010. He is also a chair of The IEICE Electronics Society Technical Committee on Integrated Circuits and Devices from 2011–2012. He received the R&D100 awards from the R&D magazine for the development of the DISP and the development of the realtime MPEG2 video encoder chipset in 1990 and 1996, respectively. He also received 21th TELECOM System Technology Award in 2006.