Pareto Points in SRAM Design Using the Sleepy Stack Approach

Similar documents
Sleepy stack: a New Approach to Low Power VLSI Logic and Memory

Pareto Points in SRAM Design Using the Sleepy Stack Approach

Pareto Points in SRAM Design Using the Sleepy Stack Approach. Abstract

Design of 64-bit hybrid carry select adder using CMOS 32nm Technology

Designing, simulation and layout of 6bit full adder in cadence software

Australian Journal of Basic and Applied Sciences. Performance Analysis of Different Types of Adder Using 3-Transistor XOR Gate

Design of a High Speed Adder

REVIEW OF CARRY SELECT ADDER BY USING BRENT KUNG ADDER

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET)

Lecture 2: Challenges and Opportunities in System LSI (1) Devices and Circuits

Design of 32 bit Parallel Prefix Adders

16-BIT CARRY SELECT ADDER. Anushree Garg B.Tech Scholar, JVW, University, Rajasthan, India

Implementation of 16-Bit Area Efficient Ling Carry Select Adder

A Novel Approach For Error Detection And Correction Using Prefix-Adders

Comparative Analysis of Adders Parallel-Prefix Adder for Their Area, Delay and Power Consumption

Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder

DESIGN AND SIMULATION OF 4-BIT ADDERS USING LT-SPICE

Design of High Speed Vedic Multiplier Using Carry Select Adder with Brent Kung Adder

Design of 16-Bit Adder Structures - Performance Comparison

Design of Carry Select Adder with Binary Excess Converter and Brent Kung Adder Using Verilog HDL

Implementation and Estimation of Delay, Power and Area for Parallel Prefix Adders

Design of Carry Select Adder Using Brent Kung Adder and BEC Adder

Design and Estimation of delay, power and area for Parallel prefix adders

Comparison of Parallel Prefix Adders Performance in an FPGA

DEVISE AND INFERENCE OF DELAY, POWER AND AREA FOR ANALOGOUS PREFIX ADDERS

Design of Modified Low Power and High Speed Carry Select Adder Using Brent Kung Adder

A COMPREHENSIVE SURVEY ON VARIOUS ADDERS AND ITS COMPACTION RESULT

Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder

A Flexible natural gas membrane Reformer for m- CHP applications FERRET

FPGA-based Emotional Behavior Design for Pet Robot

User s Guide. High Performance Linear Products SLOU119

Pixie-7P. Battery Connector Pixie-7P Fuse* Motor. 2.2 Attaching the Motor Leads. 1.0 Features of the Pixie-7P: Pixie-7P Batt Motor

utca mother board for FMC ADC daughter cards

MGL Avionics EFIS G2 and iefis. Guide to using the MGL RDAC CAN interface with the UL Power engines

GARNET STATIC SHOCK BARK COLLAR

Improving RLP Performance by Differential Treatment of Frames

A Flexible natural gas membrane Reformer for m- CHP applications FERRET

Dynamic Drug Combination Response on Pathogenic Mutations of Staphylococcus aureus

PetSpy Advanced Dog Training System, Model M86N

YELLOW VIBRATION BARK COLLAR

Inductive Proximity Switches

Demystifying Poultry Ventilation Ventilation 101

Brumation (Hibernation) in Chelonians and Snakes

STUDY BEHAVIOR OF CERTAIN PARAMETERS AFFECTING ASSESSMENT OF THE QUALITY OF QUAIL EGGS BY COMPUTER VISION SYSTEM

History of the North Carolina Layer Tests. Detailed Description of Housing and Husbandry Changes Made From through 2009

Modeling and Control of Trawl Systems

PetSpy Premium Dog Training Collar, Models M919-1/M919-2

IMPROVEMENT OF SENSORY ODOUR INTENSITY SCALE USING 1-BUTANOL FOR ENVIRONMENTAL ODOUR EVALUATION

Thank you all for doing such a good job implementing all of the September 1 Regulation and Guidelines changes! We appreciate all of your hard work.

Inverter Multi Split System

A Unique Approach to Managing the Problem of Antibiotic Resistance

2. From where the latest Software Development Kit for the EVM can be downloaded?

TPS204xB/TPS205xB Current-Limited, Power-Distribution Switches data sheet (SLVS514)

BEHAVIOR OF NURSERY-BOX-APPLIED FIPRONIL AND FIPRONIL SULFONE IN RICE PADDY FIELD THUYET D. Q., WATANABE H., MOTOBAYASHI T., OK J.

FPGA Implementation of Efficient 16-Bit Parallel Prefix Kogge Stone Architecture for Convolution Applications Geetha.B 1 Ramachandra.A.

Ultra-Fast Analysis of Contaminant Residue from Propolis by LC/MS/MS Using SPE

University of Pennsylvania. From Perception and Reasoning to Grasping

FREQUENTLY ASKED QUESTIONS Pet Owners

Complete Solutions for BROILER BREEDERS

HCM 6: Highway Capacity Manual: A Guide for Multimodal Mobility Analysis

Research Article Design of Information System for Milking Dairy Cattle and Detection of Mastitis

GARNET STATIC SHOCK BARK COLLAR

IEEE Std 592 Test Program using Current Cable Accessories and Installation Practices

Product Catalogue Trane Chilled Water Fan & Coil Units December, 2012 MC-PRC005-EN

288 Seymour River Place North Vancouver, BC V7H 1W6

FAQ (Frequently Asked Questions)

Use of English. Inside the Hyperloop

Ultra Low Power Analog Integrated Circuits for Implantable Medical Devices

Comparative Evaluation of Online and Paper & Pencil Forms for the Iowa Assessments ITP Research Series

S Fault Indicators. S.T.A.R. Type CR Faulted Circuit Indicator Installation Instructions. Contents PRODUCT INFORMATION

SOAR Research Proposal Summer How do sand boas capture prey they can t see?

Drive More Efficient Clinical Action by Streamlining the Interpretation of Test Results

RESISTANCE, USE, INTERVENTIONS. Hugh Webb

Welcome to the case study for how I cured my dog s doorbell barking in just 21 days.

Detection of Progression of Clinical Mastitis in Cows Using Hidden Markov Model

Single Port Modular Jacks

Beef Cattle Mobility: Scoring Methodology, Data Collection, and Other Considerations

Exclusive Partnership with Auburn University Auburn University Genetically Bred Dogs Latest Patented K9 Technology NCS4 Lab Tested Safety Act

IQ Range. Electrical Data 3-Phase Power Supplies. Keeping the World Flowing

EXQUISITELY DESIGNED AQUARIUMS FOR ALL EXPERIENCE LEVELS.

HSU. Turning Point Cloud

A Discrete-Event Simulation Study of the Re-emergence of S. vulgaris in Horse Farms Adopting Selective Therapy

ANS 490-A: Ewe Lamb stemperament and Effects on Maze Entry, Exit Order and Coping Styles When Exposed to Novel Stimulus

Initial Study on Electro-Mechanical Artificial Insemination (AI) Device for Small Ruminants.

2. From where the latest Software Development Kit for the EVM can be downloaded?

Trends and challenges in Engineering geodesy

Our K9 LLC 616 Corporate Way Valley Cottage New York GARNET STATIC SHOCK BARK COLLAR USERS GUIDE

Frequently Asked Questions

LP-NSM(L) Series. Features. Product Dimensions. Small size of Fast tripping resettable circuit protection

Dynamic Programming for Linear Time Incremental Parsing

San Francisco 2014 Litter Study

INTRODUCTORY ANIMAL SCIENCE

Benjamin Wang. Lakshman One School of Engineering Science Simon Fraser University Burnaby, British Columbia V5A 1S6. RE: Letter of Transmittal

Probe-Tip Clean On Demand

Finch Robot: snap levels 1-3

Ultra Min No-Bark Training Collar Ultra Small Ultra Powerful Ultra Control

Animal Control Budget Unit 2760

A Column Generation Algorithm to Solve a Synchronized Log-Truck Scheduling Problem

XL³ 800 IP 55metal distribution cabinets, freestanding enclosures and cable compartments

Lab 6: Energizer Turtles

Transcription:

Pareto Points in SRAM Design Using the Sleepy Stack Approach Jun Cheol Park^ and Vincent J. Mooney III* *Associate Director, ^Center for Research on Embedded Systems and Technology (CREST), http://www.crest.gatech.edu *Associate Professor, ^School of Electrical and Computer Engineering *Adjunct Associate Professor, College of Computing *Founder, Hardware/Software Codesign Lab, http://codesign.ece.gatech.edu Georgia Institute of Technology, Atlanta, GA, USA IFIP VLSI-SoC October 2005

Outline Introduction Related work Sleepy stack structure Sleepy stack SRAM Conclusion 2

CREST Faculty & Research Embedded System Developer Faculty M. Egerstedt Software Architecture and Modeling K. Palem S. Yalamanchili M M M M p $$ p $$ V. Mooney, D. Anderson S.-K. Lim, A. Chatterjee Physical Layer 3

Power consumption Power consumption of VLSI is a fundamental problem of mobile devices as well high-performance computers Limited operation (battery life) Heat Operation cost Power = dynamic + static Dynamic power more than 90% of total power (0.18u tech. and above) Dynamic power reduction: Technology scaling Frequency scaling Voltage scaling IBM PowerPC 970* *N. Rohrer et al., PowerPC 970 in 130nm and 90nm Technologies," IEEE International Solid-State Circuits Conference, Vol. 1, pp. 68-69, February 2004. 4

Leakage power Dynamic Power Leakage Power Leakage power became important as the feature size shrinks Subthreshold leakage Scaling down of Vth: Leakage increases exponentially as Vth decreases Short-channel effect: channel controlled by drain Our research focus Gate-oxide leakage Gate tunneling due to thin oxide High-k dielectric could be a solution 1.00E-04 1.00E-05 1.00E-06 1.00E-07 1.00E-08 1.00E-09 1.00E-10 0.18u 0.13u 0.10u 0.07u Experimental result 4-bit adder* Source Gate Drain n+ n+ Subthreshold Leakage current P-substrate NFET Gate-oxide Leakage current *Berkeley Predictive Technology Model (BPTM). [Online]. Available http://www-device.eecs.berkeley.edu/~ptm. 5

Outline Introduction Related work Sleepy stack structure Sleepy stack SRAM Conclusion 6

Low-leakage SRAM Auto-Backgate-Controlled Multi Threshold CMOS (ABC-MTCMOS) [Nii98] Reverse source-body bias during sleep mode Slow transition and large dynamic power to charge n-wells Gated-Vdd [Powell00](Prof. K. Roy) Isolate SRAM cells using sleep transistor Loses state during sleep mode Drowsy cache [Flautner02] Scaling Vdd dynamically Smaller leakage reduction (<86%) (we will show 3 orders magnitude reduction) Vdd Gate Source Drain p+ p+ n-well p-substrate ABC-MTCMOS High-Vdd 7

Low-leakage SRAM Auto-Backgate-Controlled Multi Threshold CMOS (ABC-MTCMOS) [Nii98] Reverse source-body bias during sleep mode Slow transition and large dynamic power to charge n-wells Gated-Vdd [Powell00](Prof. K. Roy) Isolate SRAM cells using sleep transistor Loses state during sleep mode Drowsy cache [Flautner02] Scaling Vdd dynamically Smaller leakage reduction (<86%) (we will show 3 orders magnitude reduction) bitline Gated-VDD control VDD VGND Gated-VDD wordline bitline *Intel introduces 65-nm sleep transistor SRAM from Intel.com, 65-nm process technology extends the benefit of Moore s law 8

Low-leakage SRAM Auto-Backgate-Controlled Multi Threshold CMOS (ABC-MTCMOS) [Nii98] Reverse source-body bias during sleep mode Slow transition and large dynamic power to charge n-wells Gated-Vdd [Powell00](Prof. K. Roy) Isolate SRAM cells using sleep transistor Loses state during sleep mode Drowsy cache [Flautner02] Scaling Vdd dynamically Smaller leakage reduction (<86%) (we will show 3 orders magnitude reduction) wordline bit VDDH VDDL N3 LowVolt LowVolt P2 P1 N2 N1 Drowsy cache N4 bit 9

Low-leakage SRAM comparison Sleepy stack SRAM cell No need to charge n-well (ABC- MTCMOS) State-saving (gated-vdd) Larger leakage power savings (drowsy cache) 10

Outline Introduction Related work Sleepy stack structure Sleepy stack SRAM Conclusion 11

Introduction of sleepy stack New state-saving ultra low-leakage technique Combination of the sleep transistor and forced stack technique Applicable to generic VLSI structures as well as SRAM Target application requires long standby with fast response, e.g., cell phone 12

Sleepy stack structure S W/L=3 W/L=3 W/L=6 W/L=3 W/L=3 W/L=1.5 S W/L=1.5 W/L=1.5 Conventional CMOS inverter Sleepy stack stack inverter First, break down a transistor similar to the forced stack technique Then add sleep transistors 13

Sleepy stack operation On S=0 Off S=1 W/L=3 W/L=3 Stack effect Low-Vth W/L=3 W/L=1.5 Stack effect High-Vth On S =1 Off S =0 W/L=1.5 W/L=1.5 Active mode Sleep mode During active mode, sleep transistors are on, then reduced resistance increases current while reducing delay During sleep mode, sleep transistors are off, stacked transistors suppress leakage current while saving state Can apply high-vth, which is not used in the forced stack technique due to the dramatic delay increase (>6.2X) 14

Sleepy stack for logic Apply sleepy stack to a chain of 4 inverters Targeting 0.07u technology Compared to forced stack, the best prior state-saving low leakage technique, sleepy stack with dual-vth achieves 215X reduction in leakage power with 6% decrease in delay Sleepy stack is 51% larger than forced stack Published in PATMOS 2004 15

Outline Introduction Related work Sleepy stack structure Sleepy stack SRAM Conclusion 16

Sleepy stack SRAM cell Sleepy stack technique achieves ultra-low leakage power while saving state Apply the sleepy stack technique to SRAM cell design Large leakage power saving expected in cache State-saving 6-T SRAM cell is based on coupled inverters SRAM cell leakage paths Cell leakage Bitline leakage 17

Sleepy stack SRAM cell Sleepy stack SRAM cell PD sleepy stack PD, WL sleepy stack PU, PD sleepy stack PU, PD, WL sleepy stack Area, delay and leakage power tradeoffs 18

Experimental methodology Estimate area by scaling down 0.18µ layout Estimate dynamic power, static power and cell read time using BPTM 0.07u technology Scaling down Area estimation Layout (Cadence Virtuoso) Schematics from layout HSPICE (Synopsys HSPICE) Power and delay estimation NCSU Cadence design kit* TSMC 0.18µ BPTM** 0.07µ *NC State University Cadence Tool Information. [Online]. Available http://www.cadence.ncsu.edu. **Berkeley Predictive Technology Model (BPTM). [Online]. Available http://www-device.eecs.berkeley.edu/~ptm. 19

Experimental methodology Base case and three techniques are compared High-Vth technique, forced stack, and sleepy stack 64x64 bit SRAM array designed Area estimated by scaling down 0.18µ layout Area of 0.18u layout*(0.07u/0.18u) Power and read time using HSPICE targeting 0.07µ 1.5xVth and 2.0xVth 25 o C and 110 o C Technique Case1 Low-Vth Std Conventional 6T SRAM Case2 PD high-vth High-Vth applied to PD Case3 PD, WL high-vth High-Vth applied to PD, WL Case4 PU, PD high-vth High-Vth applied to PU, PD Case5 PU, PD, WL high-vth High-Vth applied to PU, PD, WL Case6 PD stack Stack applied to PD Case7 PD, WL stack Stack applied to PD, WL Case8 PU, PD stack Stack applied to PU, PD Case9 PU, PD, WL stack Stack applied to PU, PD, WL Case10 PD sleepy stack Sleepy stack applied to PD Case11 PD, WL sleepy stack Sleepy stack applied to PD, WL Case12 PU, PD sleepy stack Sleepy stack applied to PU, PD Case13 PU, PD, WL sleepy stack Sleepy stack applied to PU, PD, WL 20

Experimental methodology Base case and three techniques are compared High-Vth technique, forced stack, and sleepy stack 64x64 bit SRAM array designed Area estimated by scaling down 0.18µ layout Area of 0.18u layout*((0.07u/0.18u) 2 +10%) Power and read time using HSPICE targeting 0.07µ 1.5xVth and 2.0xVth 25 o C and 110 o C Scaling down Area estimation Layout (Cadence Virtuoso) Schematics from layout HSPICE (Synopsys HSPICE) Power and delay estimation NCSU Cadence design kit* TSMC 0.18µ BPTM** 0.07µ *NC State University Cadence Tool Information. [Online]. Available http://www.cadence.ncsu.edu. **Berkeley Predictive Technology Model (BPTM). [Online]. Available http://www-device.eecs.berkeley.edu/~ptm. 21

Area Unit=µ 2 4.0E+01 3.5E+01 3.0E+01 2.5E+01 2.0E+01 1.5E+01 1.0E+01 5.0E+00 0.0E+00 PU, PD, WL sleepy stack is 113% and 83% larger than base case and PU, PD, WL forced stack, respectively 22 Low-Vth Std PD high-vth PD, WL high-vth PU, PD high-vth PU, PD, WL high-vth PD stack PD, WL stack PU, PD stack PU, PD, WL stack PD sleepy stack PD, WL sleepy stack PU, PD sleepy stack PU, PD, WL sleepy stack

Cell read time 1.8E-10 1.7E-10 1.6E-10 1.5E-10 1.4E-10 1.3E-10 1.2E-10 1.1E-10 1.0E-10 Unit=sec 1xVth, 110C 1.5xVth, 110C 2xVth, 110C Low-Vth Std PD high-vth PD, WL high-vth PU, PD high-vth PU, PD, WL high-vth PD stack PD, WL stack PU, PD stack PU, PD, WL stack PD sleepy stack PD, WL sleepy stack PU, PD sleepy stack PU, PD, WL sleepy stack Delay: High-Vth < sleepy stack < forced stack 23

Leakage power 1.0E-02 Unit=W 1.0E-03 1.0E-04 1.0E-05 1xVth, 110C 1.5xVth, 110C 2xVth, 110C 1.0E-06 Low-Vth Std PD high-vth PD, WL high-vth PU, PD high-vth PU, PD, WL high-vth PD stack PD, WL stack PU, PD stack PU, PD, WL stack PD sleepy stack PD, WL sleepy stack PU, PD sleepy stack PU, PD, WL sleepy stack At 110 o C, the worst case, leakage power: forced stack > high-vth 2xVth > sleepy stack 2xVth 24

Tradeoffs Technique Leakage power (W) 1.5xVth at 110 o C Delay (sec) Area (u 2 ) Normalized leakage power Normalized delay Normalized area Case1 Low-Vth Std 1.254E-03 1.05E-10 17.21 1.000 1.000 1.000 Case2 PD high-vth 7.159E-04 1.07E-10 17.21 0.571 1.020 1.000 Case6 PD stack 7.071E-04 1.41E-10 16.22 0.564 1.345 0.942 Case10* PD sleepy stack* 6.744E-04 1.15E-10 25.17 0.538 1.102 1.463 Case10 PD sleepy stack 6.621E-04 1.32E-10 22.91 0.528 1.263 1.331 Case4 PU, PD high-vth 5.042E-04 1.07E-10 17.21 0.402 1.020 1.000 Case8 PU, PD stack 4.952E-04 1.40E-10 15.37 0.395 1.341 0.893 Case12* PU, PD sleepy stack* 4.532E-04 1.15E-10 31.30 0.362 1.103 1.818 Case12 PU, PD sleepy stack 4.430E-04 1.35E-10 29.03 0.353 1.287 1.687 Case3 PD, WL high-vth 3.203E-04 1.17E-10 17.21 0.256 1.117 1.000 Case7 PD, WL stack 3.202E-04 1.76E-10 19.96 0.255 1.682 1.159 Case11* PD, WL sleepy stack* 2.721E-04 1.16E-10 34.40 0.217 1.111 1.998 Case11 PD, WL sleepy stack 2.451E-04 1.50E-10 29.87 0.196 1.435 1.735 Case5 PU, PD, WL high-vth 1.074E-04 1.16E-10 17.21 0.086 1.110 1.000 Case9 PU, PD, WL stack 1.043E-04 1.75E-10 19.96 0.083 1.678 1.159 Case13* PU, PD, WL sleepy stack* 4.308E-05 1.16E-10 41.12 0.034 1.112 2.389 Case13 PU, PD, WL sleepy stack 2.093E-05 1.52E-10 36.61 0.017 1.450 2.127 Sleepy stack delay is matched to Case5 ( * means delay matched to Case5=best prior work) Sleepy stack SRAM provides new pareto points (blue rows) Case13 achieves 5.13X leakage reduction (with 32% delay increase), alternatively Case13* achieves 2.49X leakage reduction compared to Case5 (while matching delay to Case5) 25

Tradeoffs 2.0xVth at 110 o C Technique Static (W) Delay (sec) Area (u 2 ) Normalized leakage Normalized delay Normalized area Case1 Low-Vth Std 1.25E-03 1.05E-10 17.21 1.000 1.000 1.000 Case6 PD stack 7.07E-04 1.41E-10 16.22 0.564 1.345 0.942 Case2 PD high-vth 6.65E-04 1.11E-10 17.21 0.530 1.061 1.000 Case10 PD sleepy stack 6.51E-04 1.31E-10 22.91 0.519 1.254 1.331 Case10* PD sleepy stack* 6.51E-04 1.31E-10 22.91 0.519 1.254 1.331 Case8 PU, PD stack 4.95E-04 1.40E-10 15.37 0.395 1.341 0.893 Case4 PU, PD high-vth 4.42E-04 1.10E-10 17.21 0.352 1.048 1.000 Case12* PU, PD sleepy stack* 4.31E-04 1.33E-10 29.48 0.344 1.270 1.713 Case12 PU, PD sleepy stack 4.31E-04 1.38E-10 29.03 0.344 1.319 1.687 Case7 PD, WL stack 3.20E-04 1.76E-10 19.96 0.255 1.682 1.159 Case3 PD, WL high-vth 2.33E-04 1.32E-10 17.21 0.186 1.262 1.000 Case11* PD, WL sleepy stack* 2.29E-04 1.30E-10 32.28 0.183 1.239 1.876 Case11 PD, WL sleepy stack 2.28E-04 1.62E-10 29.87 0.182 1.546 1.735 Case9 PU, PD, WL stack 1.04E-04 1.75E-10 19.96 0.083 1.678 1.159 Case5 PU, PD, WL high-vth 8.19E-06 1.32E-10 17.21 0.007 1.259 1.000 Case13* PU, PD, WL sleepy stack* 3.62E-06 1.32E-10 38.78 0.003 1.265 2.253 Case13 PU, PD, WL sleepy stack 2.95E-06 1.57E-10 36.61 0.002 1.504 2.127 Sleepy stack delay is matched to Case5 ( * means delay matched to Case5=best prior work) Sleepy stack SRAM provides new pareto points (blue rows) Case13 achieves 2.77X leakage reduction (with 19% delay increase over Case5), alternatively Case13* achieves 2.26X leakage reduction compared to Case5 (while matching delay to Case5) 26

Static noise margin Technique Static noise margin (V) Active mode Sleep mode Case1 Low-Vth Std 0.299 N/A Case10 PD sleepy stack 3.167 0.362 Case11 PD, WL sleepy stack 0.324 0.363 Case12 PU, PD sleepy stack 0.299 0.384 Case13 PU, PD, WL sleepy stack 0.299 0.384 Measure noise immunity using static noise margin (SNM) SNM of the sleepy stack is similar or better than the base case 27

Conclusion Sleepy stack SRAM cell provides new pareto points in ultra-low leakage power consumption 2.77X leakage reduction over high-vth with 19% delay increase or 2.26X without delay increase Sleepy stack SRAM cell shows the same or better SNM than the base case 28