### A FEED-FORWARD DYNAMIC VOLTAGE FREQUENCY MANAGEMENT FOR POWER-MINIMUM MOTION VIDEO COMPRESSION IN SUB-DECIMICRON ERA

## Kentaro KAWAKAMI, Kanazawa University, Japan, kawakami@mics.cc.t.kanazawa-u.ac.jp Miwako KANAMORI, Kanazawa University, Japan Yasuhiro MORITA, Kanazawa University, Japan Jun TAKEMURA, Kanazawa University Junichi MIYAKOSHI, Kanazawa University, Japan Hideo OHIRA, Kanazawa University, Japan Masayuki MIYAMA, Kanazawa University, Japan Masahiko YOSHIMOTO, Kanazawa University, Japan

## ABSTRACT

In this paper, a feed-forward dynamic voltage frequency management method for powerminimum motion video compression is described. This method, which controls operating voltage/frequency and body bias voltage of a RISC processor, reduces power consumption of software based video compression processing. The SPICE simulation indicates that the proposed method reduces 82% to 65% of power consumption depending on the characteristics of input sequences.

KEYWORDS: MPEG4 visual compression, low power, feed-forward dynamic voltage/frequency control, body bias control

### 1. INTRODUCTION

The 3rd generation wireless communication services have been started, and rich media services, mainly audio/visual communication or streaming services, have been expected to be a key application through mobile phone terminals. The video compression processing requires high processing performance around several hundred mega operation per second (MOPS), therefore dedicated hardware approach has been a major approach in terms of low power advantages on mobile terminals. However, required specifications for the video compression LSI are not only low power characteristics, but also flexibility for the future system. In the coming ubiquitous era, video compression LSI has to realize flexibility to various video compression standards, extensibility for expansion of resolution and frame rate, and re-usability as a intellectual property (IP) core. To satisfy these three requirements, software based processing is the best approach. Recent embedded RISC processor fabricated by sub-decimicron technology achieves several hundred MOPS, and it can handle a real time video compression software. However, the power consumption of software based processing is not adequately low.

With the progress of the technology scaling beyond 90[nm], the threshold voltage is lowered, and it causes growth of leakage power. Figure 1 shows simulated power consumption of 256KB SRAM in 90[nm] technology. It indicates that suppress of leakage power is indispensable in the coming sub-decimicron technology era.

This paper proposes a feed-forward dynamic voltage and frequency management method to minimize the total power of software based video compression processing. This method cooperatively controls operating voltage/frequency and body bias voltage to reduce both of dynamic power and leakage power.

## 2, POWER CONSUMPTION OF 32BITS RISC PROCESSOR IN 90NM TECHNOLOGY

In sub-decimicron technology, both of subthreshold leakage power and dynamic power should be taken into account. A  $V_{dd}$ -hopping scheme [1] and  $V_{bb}$ -hopping scheme [2] was proposed to reduce the dynamic power and the leakage power, respectively. These schemes dynamically controls operating voltage or body bias voltage in association with operating frequency. Considering the reduction of total power consumption including both of the dynamic and the leakage power. The scaling down of  $V_{dd}$  to reduce the dynamic power increases the leakage power. The scaling down of  $V_{dd}$  degrade the operating frequency, therefore the body bias voltage has to be controlled toward lowering the threshold voltage to compensate the operating frequency. It results in the increase of the leakage power. For this reason, balancing the supply voltage and the body bias voltage is desirable to minimize the total power consumption [3].

Figure 2 shows the block diagram of the example of 32bits RISC processor to implement software video compression, which has a 32bits data-path, 16KB data-cache (D-cache), 16KB instruction-cache (I-cache), 256KB internal SRAM, DMA controllers, peripherals and so on. Figure 3 shows simulated power consumption of the RISC processor. "Common Design Rules for 0.1 micron" recommended by Semiconductor Technology Academic Research Center (STARC) are used as the SPICE model of a 90[nm] process technology. The toggle rate of logic portion is set to 15[%]. The accessing rate of the D-cache, the I-cache and the internal SRAM are set to 50[%], 90[%] and 13[%], respectively. These values are estimated by the HDL (hardware description language) level simulation with MPEG4 visual compression software.

Power consumption values obtained from the SPICE simulation are plotted in Fig. 3. The values besides the plotted points represent the NMOS body bias voltage  $(V_{hbn})$  or operating frequency. PMOS body bias voltage is always set to  $V_{ab} - V_{bbn}$ . There are eight lines in Fig. 3. The five broken lines represent the operating voltage-power consumption characteristics under the constant operating frequency of 50, 100, 150, 200 and 250[MHz]. In these lines,  $V_{bbn}$  is modified to compensate degradations of operating frequency. Three continuous lines represent the power of the RISC in case of the V<sub>dd</sub>-hopping scheme, the V<sub>hb</sub>-hopping scheme and the V<sub>dd</sub>-V<sub>bb</sub>-hopping scheme.

The simulation results indicates that 0.7[V] operating voltage is needed for 250[MHz] at  $V_{bbn}=0[V]$ , which is required for real time MPEG4 video compression @(QCIF, 15[frame/s]). If the  $V_{dd}$ -hopping or the  $V_{bb}$ -hopping scheme is applied, the RISC at 50[MHz] operation consumes 5.10 or 3.55[mW]. However, the power minimum conditions exist in the five broken lines. Therefore, balancing both of the  $V_{dd}$  and  $V_{bb}$  (the  $V_{dd}$ -V<sub>bb</sub>-hopping scheme) results in a further reduction of power.

#### 3. PROCESSING PERFORMANCE FOR MPEG4-VISUAL ENCODING

MPEG4 QCIF 15[frame/s] video compression requires approximately 200-300[MOPS]. However, these values are the average values and high motion sequence requires more performance, and low motion sequence requires less performance. Required performance depends totally on the video sequence activity. Figure 5 shows a block diagram of MPEG4 processing. Shaded blocks in Fig. 5 are the processing function whose performance is affected by video sequence activity. Required performance of motion compensation (MC), Inverse DCT (IDCT), Inverse Quantization (IQ), and variable length coding (VLC) has been influenced according to the characteristics of the video sequence. Each processing is computationally intensive function, and approximately eighty percent of total MPEG4 performance is occupied by MC, IDCT, IQ, and VLC. Consequently, total required MPEG4 processing performance completely varies according to the sequence activity.

## 4. DYNAMIC VOLTAGE/FREQUENCY MANAGEMENT METHOD BY FORWARD ANALYSIS

As mentioned in Section 3, the required performance of MPEG4 visual compression dynamically changes according to the sequence activity. Also as simulated in Section 2, the operating voltage/frequency management drastically reduces the power of RISC processors. There exist the power minimum  $V_{dd}$ - $V_{bh}$  combinations. Combining these characteristics, the proposed feed-forward voltage/frequency management using our unique forward analysis algorithm can minimize the power of the software based MPEG4 visual compression.

# 4.1 DETAILS OF OUR PROPOSED FEED-FORWARD DYNAMIC VOLTAGE/FREQUENCY MANAGEMENT METHOD

A control of feed-forward dynamic management method is show in Fig. 4, and the timing sequence of MPEG4 process adopting the feed-forward dynamic voltage/frequency management method are shown in Fig. 6. MPEG4 visual compression processing using the our proposed method is as follows;

- Prediction of required performance for MPEG4 visual compression per frame using the parameters of motion activity or other parameters.
- 2) Calculation of the required operating frequency  $(F_n)$  from the forward-analysis prediction.
- 3) Controlling the frequency at predicted value and setting  $V_{dd}$  and  $V_{bb}$  which minimize the power at that frequency.
- 4) Compression of the new frame at the modified voltage and frequency.

The above 1) - 3) processes correspond to the feed-forward dynamic voltage/frequency management method, and it requires only less than  $I\{MHz\}$  cycle, which is negligible compared to MPEG4 compression. The duration time for voltage and frequency stability after controlling voltage value ("B" in Fig. 6) is micro seconds order which is also negligible comparing to the allocated time for frame (ex. allocated time for a frame in case of 15[frame/s] is 66.7[ms]).

# 4.2 PREDICTION OF REQUIRED PERFORMANCDE BY FORWARD ANALYSIS

The forward analysis algorithm predicts future frame processing performance. Table 1 describes the parameters that affect to these processing functions. The forward analysis algorithm predicts the required performance from the following parameters;

- 1) Number of MB block matching : N
- 2) Number of valid DCT coefficients: VC
- 3) Number of valid blocks: VB

The parameters of N, VB, VC are assumed to be predicted from the following equations respectively;

| $N = a \times N' + b \times ABS_{f} + c \times \Delta Q \tag{1}$ | ) |
|------------------------------------------------------------------|---|
|------------------------------------------------------------------|---|

 $VB = d \times VB' + e \times ABS_{f} + f \times \Delta Q \tag{2}$ 

$$VC = g \times VC' + h \times ABS_{i} + i \times \Delta Q \tag{3}$$

where  $ABS_f$  is sum of absolute difference of luminance between a current frame and the previous frame, N', VB' and VC' are a actual number of MB matching processing times, an actual number of valid blocks, and an actual number of valid DCT coefficients at the previous frame, respectively. To predict N value, three parameters (N',  $ABS_f$  and VC) are chosen as affecting parameters.

- N': Video sequences have good correlation between frames. When number of MB matching is large in a frame, it is tended to be large in a next frame.
- $ABS_f$ :  $ABS_f$  indicates the differential between frames, and when  $ABS_f$  is large, then N will be large.
- $\Delta Q$ : The increase of  $\Delta Q$  results in the prediction error. The prediction error increases the *N*.

VB and VC are also assumed to be predicted from three parameters with the same assumption. Furthermore in this paper, required processing performance for motion compensation processing ( $F_{ac}$ ), IQ processing ( $F_{iq}$ ), IDCT ( $F_{idci}$ ), VLC ( $F_{vdc}$ ) are assumed to be predicted from the following equations respectively;

$$F_{ne} = j + A \times N \tag{4}$$

$$F_{iq} = k + B \times VC \tag{5}$$

$$F_{ukt} = I + C \times VB \tag{6}$$

$$F_{v,v} = m + D \times VC \tag{7}$$

where A is processing performance for a MB matching, B is processing performance for a IQ processing, C is processing performance for a IDCT processing, D is processing performance for a VLC processing, and j, k, l, m are constant parameters. Total required performance  $F_{\mu}$  is :

$$F_p = F_{out} + F_{iq} + F_{idet} + F_{ole} + F_{oders}$$
(8)

where  $F_{othera}$  is rest of MPEG4 processing. Substituting Eq. (1) - (7) to Eq. (8),  $F_p$  is predicted from the parameters of N', VB', VC', ABS<sub>i</sub> and  $\Delta Q$  defined as Eq. (9).

$$F_{p} = n + \alpha \times N' + \beta \times VB' + \gamma \times VC' + \delta \times ABS_{f} + \varepsilon \times \Delta Q$$
(9)

where  $\alpha, \beta, \gamma, \delta, \varepsilon$  are coefficients, and *n* is constant value.

# 5. SIMULATION RESULTS

# 5.1 CLOCK FREQUENCY PREDICTION

In order to decide constant parameters at Eq. (9), simulation has been executed on the reference kit of a commercial 32bit RISC processor. The simulation has been led with constant Q by 17 sequences each of which has originally 150 frames or 5 seconds. The simulation flow is as follows;

- 1) Monitoring actually required performance  $(F_u)$ , N', VB', and VC' form MPEG4 software running on the 32bits RISC processor.
- 2) Determination of the values of coefficient of n,  $\alpha$ ,  $\beta$ ,  $\gamma$ , and  $\delta$  in Eq. (9) by the regression analysis method.

Also from the simulation results and Eq. (9), predicted frequency  $F_p[MHz]$  is defined as;

 $F_{\mu} = (89.94 + 0.0114 \times N' + 0.0666 \times VB' + 0.0031 \times VC' + 0.4150 \times ABS_{\mu} \times 10^{5}) / 10^{6}$ (10)

Eq. (10) is obtained by the regression analysis method from 1018 points in 17 sequences. Figure 7 shows the correlation between predicted frequency  $(F_p)$  from Eq. (10) and actually required frequency  $(F_a)$ . The measured  $F_p$  lies between 92[MHz] and 188[MHz] for high quality implementation, depending on characteristics of video sequences. These values are reasonable for the single RISC architecture without additional DSP core. Form the Fig. 7, Eq. (10) well predicts the actually required frequency. The case that the predicted frequency is less than the actually required frequency results in a failure situation. Therefore, Eq. (10) should be modified in order to avoid the frequent error situation. Prediction mismatch does not occur in the area of  $F_p > F_a$  in the Fig. 7. By the following Eq. (11) modified from Eq.(10), 99.9% of points satisfy the condition of  $F_p > F_a$ .

 $F_{p} = (89.94 + 0.0114 \times N' + 0.0666 \times VB' + 0.0031 \times VC'' + 0.4150 \times ABS_{T} \times 10^{5}) \times 1.1/10^{6}$ 

### 5.2 POWER CONSUMPTION REDUCTION

The required maximum frequency is roughly  $230[MW_2]$  assuming the maximum number of the block matching N, the valid block VB and the valid DCT coefficient VC. The power consumption reduction ratio r is defined as follows;

$$=\frac{1}{M}\sum_{i=1}^{M}\frac{p_{a}}{p_{k}}$$
(12)

(H)

where *M* is the number of frame in a sequence,  $p_{\mu}$  is power consumption per frame controlled by our method, and  $p_h$  is power consumption per frame at conventional method. In the case that the predicted frequency is greater than 100[MHz] and less than equals to 150[MHz], the predicted frequency range becomes 150[MHz]. In this range, for example,  $p_{\nu}$  of the V<sub>dd</sub>-hopping scheme equals to 12.3[mW] from Fig. 3. Figure 8 shows the power consumption reduction ratio *r* with four sequences that includes the best case (low-motion sequence "Akiyo") and the worst case (high-motion sequence "Bus") among the 17 sequences. From 82[%] to 65[%] of power reduction is estimated with the V<sub>dd</sub>-V<sub>sb</sub>-hopping scheme.

#### 6. SUMMARY

Low power approach for MPEG4 visual compression processing applying the feed-forward dynamic voltage frequency management has been presented. Combining the feed-forward dynamic management according to the predicted processing performance and the characteristics of video compression processing, the proposed method achieves minimized power consumption. The simulation results indicate that the forward analysis algorithm well predicts the actual processing performance. By controlling voltage and frequency of a RISC processor dynamically by every frame, from 82% to 65% reduction can be achieved. In on and after 90[nm] technology era, the feed-forward dynamic voltage frequency management adopting the forward analysis algorithm effectively reduces the total power consumption of visual compression processing

### 7. REFERENCES

[1] S. Lee, et al., "Run-time voltage hopping...," in IEEE/ACM Proc. Design Automation Conf., 2000, pp. 806-809.

[2] K. Nose, et al., "Vth-Hopping Scheme to Reduce Subthreshold Leakage...," IEEE J. of Solid-State Circuits, Vol. 37, No. 3, 2002, pp. 413-419.

[3] J. Kao, et al., "A 175-mV Multiply-Accumulate Unit...," IEEE J. of Solid-State Circuits, Vol. 37, No.11, 2002, pp. 1545-1554.





Figure 1. Simulated power consumption of 256KB SRAM in 90[nm] technology.

Figure 2. Block diagram of simulated 32bits RISC.

| Processing function    | Affecting parameters for the      | Parameters |
|------------------------|-----------------------------------|------------|
|                        | processing power                  |            |
| Motion compensation    | Numbers of MB matching            | N          |
| Inverse quantization   | Numbers of valid DCT coefficients | VC         |
| Inverse DCT            | Numbers of valid blocks           | VB         |
| Variable length coding | Numbers of valid blocks           | VB         |

Table 1. Processing function.



Figure 7. Predicted frequency  $(F_p)$  v.s. actual frequency  $(F_a)$