loading

Logout succeed

Logout succeed. See you again!

ebook img

Power-Performance Models for Runtime Reconfiguration and Power Capping PDF

pages13 Pages
release year2015
file size0.24 MB
languageEnglish

Preview Power-Performance Models for Runtime Reconfiguration and Power Capping

Power-Performance Models for Runtime Reconfiguration and Power Capping Pietro Cicotti, Ananta Tiwari (AT), Laura Carrington EP Analytics, Inc. PMaC/SDSC MODSIM 2015 Presenter - Ananta Tiwari ([email protected]) Corresponding Author – Pietro Cicotti ([email protected]) Motivation •  Goals: –  Support proactive run time decisions –  Create integrated power/performance models •  Requirement: –  Incur little runtime overhead •  Use little information •  Be queried quickly •  Approaches: –  Instruction-Level Modeling •  Constructed using single-instruction benchmarks •  Correlates instructions in compute phase to performance/power –  Statistical Modeling •  Constructed on micro-benchmarks •  Correlates performance hardware counters to performance/power Use-case •  Run-time system activated or informed before a compute phase –  Compute phases identified in the source code •  Run-time system API calls added to the source code –  Compute phases identified in the binary •  Run-time system API calls added by binary instrumentation –  Runtime queries models and selects optimal configuration •  Performance/power locally optimized •  Power cap globally imposed (out of scope) Instruction-Level Models •  Instruction-level – measure cost in terms of performance and energy for all instructions –  Benchmark the contribution of individual instructions •  add r1_64b,r2_64b -> 1 cycle, 1.4nJ (2.6GHz), 1.2nJ (2.5GHz), … –  Create a model that aggregates the contributions •  [email protected] –  Time=2.5×10-9×α× ∑cycles i –  Energy=∑energy(2.5GHz) i –  Power=Energy/Time –  Offline •  Measure the contribution of single instructions •  Create the model •  Analyze/instrument code –  Online •  Use information from static analysis before compute phase start and invoke run time system •  Use information at run time dynamic execution –  E.g. tune α for expected hit rate •  Search performance/power space at different frequencies •  Optimize: e.g. power limit and Energy Delay Product Benchmarking Instructions •  Reduce benchmarking space –  300+ instructions in x86_64 ISA –  Some instructions are overloaded •  Different data types, number of operands, etc. •  Group instructions in equivalence classes –  Members of an equivalence class have same latency and energy cost –  E.g. [add]={add, sub, and, …} •  Approximate energy at different frequencies Benchmarks •  Arbitrarily long sequence of embedded asm for(i=0;i<n;++i)      UNROLL(asm  vola8le  ("subsd  %%xmm1,  %%xmm0\t\n"::));   0000000000400860  <main>:      ...      400900:  f2  0f  5c  c1                    subsd    %xmm1,%xmm0      400904:  f2  0f  5c  c1                    subsd    %xmm1,%xmm0      400908:  f2  0f  5c  c1                    subsd    %xmm1,%xmm0   …   •  Power/Energy measured for system (Watts), package (RAPL) and DRAM (RAPL) Memory Operands •  Load/store instructions •  Instructions with memory operands •  Latency and energy depend on level servicing request –  E.g. latency=4 cycles, 12 cycles, 54 cycles, 375 cycles •  ad-hoc benchmark to target a single level •  Need estimate at runtime of hit rates Integration with Tools •  Compile time or static binary analysis –  Determine instruction mix •  Tools (or programmer) –  Identify compute phases and insert calls to run time system –  setup ad-hoc model for a given compute phase •  run time parameters: hit rates, optimization target, and power cap –  if know or reasonably estimate possible, model is statically tuned •  Run time system –  Receives parameters before computation phase –  Runs model and selects optimal DVFS setting Machine-Learning Approach •  Develop machine learning based model to inform power capping decisions –  Models are trained using hardware counters –  Explore the performance and power sensitivity of different computations when power-related hardware parameters change Enabling Components •  Main enabling components –  Modeling methodology that can encapsulate the relationship between hardware power states, application characteristics derived from hardware counters and power/performance responses –  A set of computational kernels that are representative of most of the computations we see in HPC

See more

The list of books you might like