Continuous Optimization and Coordinated Power Management 1 Technical Areas of Research IBM

Continuous Optimization and Coordinated Power Management

IBM University Faculty Award Program Proposal

Prof.Stephen W.Keckler

Department of Computer Sciences

The University of Texas at Austin

IBM Technical Sponsor:Ron Kalla(Systems Group)

March10,2005

1Technical Areas of Research

VLSI design,power management,continuous optimization

2Project Description

As computer chip designers have pushed aggressively for higher performance processes,circuits,and systems, design margins have shrunk dramatically.For example,in the relatively recent past,peak power consumption was well below packaging limits for high-performance systems.However,today’s package limitations on both power consumption and heat dissipation have risen to?rst order design constraints.While most chips are designed for a particular maximum thermal operating point,they typically either operate far from that point or are throttled in a coarse grain fashion to prevent thermal violations.A typical workload for a computer system and how the workload uses the system components changes drastically over time.Web and e-commerce servers see varying load depending on the time of day.The ambient temperature seen by a computer in a machine room may vary not just on load,but also when other systems are added to or removed from the room. Applications with large data sets and irregular access patterns often exhibit poor cache behavior,leaving the processor stalled for extended periods of time,while other applications may be more processor or disk intensive. Even a single application goes through phases which place varying burdens on the system[1].A system designed for maximum activity rates of all its components would certainly not exceed its power limits,but would typically operate far from its true capabilities.The key challenge is not solely to reduce power consumption,but instead to deliver energy to where it is most useful in the system at any given time.

We propose to use on-line continuous optimization to dynamically tune the system to meet power,temper-ature,and energy constraints.While substantial opportunities for reducing power consumption are available, extending the current strategy of localized control of individual power management techniques is not viable. Without coordinated control,a collection of individual techniques may be enabled in destructive or ineffective combinations.Existing open-loop control techniques do not guarantee effective operation throughout the wide range of process variability,application space,and operating conditions.A simple reactive approach of enabling a power(or other metric)saving technique based on a pre-de?ned set of events such as”after1000cycles of inactivity,transition to sleep mode”does not take into account the effectiveness of the action at runtime.Tech-niques that are applied globally,such as Intel’s recently announced”Demand Based Switching”which allows ?ne grain adjustments to chip-wide voltage and frequency,lack the ability to channel the energy to different parts of the chip at different times.We propose to examine and evaluate closed-loop power management mechanisms that monitor and measure their effectiveness over time and across applications,as well as allocate energy to the most critical resources at any given time.

Opportunities for power management:In our prior work,we investigated opportunities for reducing both dynamic power[2]and static power[3]in the context of an out-of-order microprocessor.In our studies of

dynamic power we tracked dynamic power consumption throughout a pipeline model of the Alpha21264pro-cessor,noting the power tax of mis-prediction and over-provisioning.We found that mis-prediction accounted for approximately6Over-provisioned structures that are designed for maximum throughput but not fully used by typical programs accounted for about17pipeline energy.In our study of static power,we compared the effectiveness(from the microarchitectural perspective)of different mechanisms that reduce static power con-sumption in caches,including power gating and dynamic threshold voltage modulation.We found that these varying techniques provided different bene?ts to different caches,and could improve the energy-delay product of these caches by a factor of20-50,depending on the cache and the technique.

The literature on microarchitectural mechanisms is already large and continues to grow.Dynamic manage-ment techniques include clock gating,dynamic voltage/frequency scaling,pipeline gating,pipeline throttling, and dynamic microarchitectural structure size modulation.Additional strategies to dynamically manage leakage energy include instruction-cache resizing and drowsy caches,each of which places a portion of a cache into a low-power state.

Challenges for combining techniques:Simply extending the existing class of microarchitectural manage-ment techniques to encompass power,energy,and temperature constraints falls short of a robust management system.Employing multiple simultaneous power management techniques poses two main concerns.First, power management parameters are typically determined with incomplete knowledge of physical environment, operating conditions,and application characteristics.If code pro?ling and pre-fabrication processor simulations do not accurately match actual runtime conditions,the mismatch can lead to ineffective management.For exam-ple,changing the frequency and voltage settings based on recent program behavior via a performance monitor may provide excellent control for the test benchmark suite yet result in a pathological case for a customer’s proprietary software.

Second,runtime events could repeatedly trigger con?icts between management policies.For example,an energy-saving policy might set the frequency at a fast rate for a program so that it can complete the task quickly and then power down to conserve static energy.A separate temperature policy might set a lower frequency to cool the chip in the event of excessive heat dissipation.During program execution,the chip could breach a temperature threshold,causing oscillations between management mechanisms that trigger a slower frequency for cooling and faster frequency to optimize leakage.Avoiding such con?icts requires testing each combination of techniques,adding to the cost and complexity of processor veri?cation.

Coordinated Power Management:We propose to control the power management mechanisms in a coor-dinated fashion,adjusting them in concert to achieve the desired performance goals within the constraints of limited power,energy,and temperature levels.The infrastructure for coordinated power management includes a collection of sensors(which could include temperature sensors as well as activity counters),a set of actuators for adjusting the various power management parameters,and a controller that makes policy decisions.We expect that the algorithms and changing policies may require programmability in the form of a simple embedded pro-cessor.While we will initially focus on a single-chip microprocessor with an embedded power manager,we also forsee this approach complementing a system-level strategy(such as that across an SMP)in which the different nodes in the system are run at varying frequencies and power consumption according to load[4].

Current power managers react to speci?c events with pre-determined responses,such as the Pentium4thermal control policy paraphrased as”if temperature exceeds the threshold,then enable intermittent clock gating.”A goal-driven management approach adapts to a wider range of operating conditions and resource use,allowing processors to run closer to the edge of power,temperature,and energy limits.A goal-seeking approach is?ex-ible,unlike trigger-driven decisions that react to speci?c events with pre-determined responses.For example, a goal-driven controller facing an impending thermal emergency selects the most effective choice for the situ-ation,choosing the best combination of clock gating,thread migration,voltage and frequency scaling,or other

options.It can provide safer operating conditions for run-time environments and con?gurations not expected during design and validation phases.

Our coordinated approach would supply a goal to the power manager such as”maximum performance within set temperature and energy limits,”which would then select the appropriate mechanisms to achieve the goal.The manager maintains a model of the system and understands the?rst-order sensitivity of performance,tempera-ture,and power to the management actuators at its disposal.We will explore a family of algorithms,including constrained-optimization approaches,which can use gradient descent techniques to drive the con?guration to-ward the desired goal using feedback from the sensors.Consequently,the manager can track system behavior and shift goal objectives in synchrony with changing application demands and energy resources.This closed-loop feedback system is a very powerful paradigm for empirically?nding good con?gurations,but good control systems engineering methods must be applied.

As an example of continuous optimization for power,consider the following scenario.The operating system noti?es the coordinated manager to seek the goal of high throughput within limits of a strict upper bound on temperature with moderate power and energy thresholds.The processor is currently operating with a mid-range voltage level;sensor data indicate that the temperature is within an acceptable range and that the performance is less than the goal.The manager directs the voltage regulator to step up the supply voltage and monitors the temperature rise and performance counters,and continues to raise the frequency and voltage until achieving the desired performance target.If a running application causes a thermal spike,the manager takes immediate action to coordinate a response between the voltage,frequency,and activity migration controls,while postpon-ing a cache leakage policy that would have created a temporary increase in write-back traf?c at an inoppor-tune moment.With coordinated information from multiple sources and a goal-driven algorithm,a hierarchical power/energy/temperature manager can adapt to the system environment and push the operating conditions to the edge of acceptable limits.

The coordinated manager design integrates the fundamental principles of closed-loop and goal-driven control through the following basic mechanisms:

(1)Sensors:The manager requires access to temperature sensors and event counters(collectively referred to as sensors)throughout the chip at appropriate sampling intervals.The manager can also use activity counter data to track decision effectiveness and determine cost functions for knob settings.

(2)Actuators:A coordinated manager requires useful knobs to turn,such as DVFS,pipeline width modula-tion,and sleep mode techniques in our experiments.A selection of knobs that encompass a range of options from coarse-grain global control to?ne-grain localized control provide resolution for tuning the processor’s operation to its goal state.

(3)Feedback algorithms:A robust algorithm directs knob settings by synthesizing information from sensors and counters.The algorithm must be stable over a wide range of input and goal functions in order to prevent system failure from errant control decisions.

(4)Hierarchy:The manager will span hardware and software for a combination of immediate control and and?exibility.A hierarchy within the manager distributes decisions according to required response time:quick response in hardware for phenomena with short time constants,such as a jump in leakage power when a unit exits sleep mode;and software to handle longer intervals between decisions for slow-moving trends like gradual chip warming.

(5)Granularity:Some responses such as universal clock reduction are applied at a global level,while others, such as cache sleep modes,target only a localized area.The advent of techniques such as voltage islands and globally asynchronous,locally synchronous(GALS)designs will enable techniques such as DVFS to be applied non-uniformly across the chip.A coordinated manager can tune a wide range of coarse and?ne grain management techniques to ef?ciently manage resources.

Evaluation:We have completed the development of an architectural simulation infrastructure to quantify the effect of power,temperature,and energy management decisions.Our infrastructure combines our detailed and validated microarchitectural simulator(sim-alpha)with the Wattch power model and the HotSpot temperature model.We have already extended the simulator to include power management techniques such as dynamic frequency and voltage scaling,pipeline throttling,and cache leakage control.Our initial experiments measured system with no power management,uncoordinated power management,and?xed power management(trying all possible power management parameter settings and picking the best one–a technique not feasible in reality)[5]. The results show that the best power management settings substantially outperform the uncoordinated manager and no power management by a wide margin on a subset of the SPEC2000benchmarks.

In the coming year,we will evaluate algorithms for dynamically managing and allocating energy subject to temperature,power,and performance constraints.We hope to surpass the performance of the optimal off-line algorithm with a good dynamic on-line algorithm that operates in conjunction with application execution.We will extend our simulation infrastructure to include performance counters and a sensor network to provide data for the coordinated manager’s online algorithm and measure the system response at realistic sampling intervals. Future work may examine the viability of using?ner-grained voltage/frequency modulation,as afforded through fabrication and circuit techniques such as voltage/frequency islands.

References:

[1]“Discovering and Exploiting Program Phases,”T.Sherwood,E.Perelman,G.Hamerly,S.Sair,and B.Calder IEEE Micro,23(6),pp.84-93,November/December,2003.

[2]“Microprocessor Pipeline Energy Analysis,”R.Natarajan,H.Hanson,S.W.Keckler,C.R.Moore,and D. Burger,IEEE International Symposium on Low Power Electronics and Design(ISLPED),pp.282-287,August, 2003.

[3]“Static Energy Reduction Techniques for Microprocessor Caches,”H.Hanson,M.S.Hrishikesh,V.Agarwal, S.W.Keckler,and D.Burger,IEEE Transactions on VLSI Systems,11(3),pp.303-313,June,2003.

[4]“Scheduling for Heterogeneous Processors in Server Systems,”S.Ghiasi,T.Keller,and F.Rawson(IBM Austin Research Laboratory),Computing Frontiers Conference,May,2005.

[5]“A Case for Coordinated Management of Performance,Power,Energy,and Temperature,”H.Hanson and S.Keckler,submitted to the IEEE International Symposium on Low Power Electronics and Design(ISLPED), 2005.

3Project Objectives and Goals

Our primary goals are to answer the following research questions:

What are the limits of individual power management techniques applied in isolation?

How do these different power management techniques interact when applied simultaneously,but con-trolled independently?Are the interactions complementary or confrontational?

What are the appropriate metrics for power/thermal optimization(temperature,power consumption?),and what are the most appropriate means of measuring them on-line?

What are the natural time constants of the power management techniques?How long does it take to invoke each technique,what is the overhead,and how long does it take for the technique to take effect?

What are the limits of a coordinated approach to power management,in which all of the power manage-ment techniques are controlled in a cooperative fashion?

How close can real control algorithms approach the optimal limits of power management?What are the bene?ts of feedback control algorithms over open-loop algorithms?

What is the right balance between hardware and software in implementing an embedded power manager?

How do the best power management policies on an aggressive conventional architecture compare to a more conservative simpler(and perhaps more inherently power ef?cient)architecture with less extensive power management,in terms of power and performance?

In addition,we expect that the insights developed during this study will be of interest to the IBM Systems Group.We expect to interact with the Ron Kalla and Carl Anderson(among others at IBM)to ensure relevance of the work to IBM and to provide a conduit for the insights back into IBM.

4Long Term Impact

Aggressive power management is necessary to limit the packaging and system costs for power delivery and cooling in both high-end and low-end systems.On-line continuous optimization represents a departure from the conventional approach of designing a chip/system for the worst case.Such optimization will allow designs to potentially exceed their power and temperature limits,but will rely on on-line mechanisms to ensure safe operating conditions.This approach will allow the system to run closer to the edge of the power/performance envelope than current strategies that overly restrict the system at design time.IBM has recognized the need for power management and has established a corporate-wide low power initiative centered at the IBM Austin Research Laboratory(ARL).In addition,new IBM initiatives in autonomic computing are well-matched with the notion of continuous optimization described in this https://www.360docs.net/doc/129182294.html,bining the circuits and systems work from the ARL with the expected microarchitectural results from this research will likely prove bene?cial to future designs within the IBM Systems group.We are uniquely positioned to investigate this area because of our strong ties to IBM in both the TRIPS and PERCS projects.