perf is a tool to profile a process using Hardware Performance Counters. Each counter counts
Events in the CPU such as cycles, executed instructions, load from a given level of the memory caches,
branches…
perf listlists the event that are predefined inside the tool.
the easiest way to use perf is to profile the whole application (say ./a.out) using a default set of events
perf stat -d ./a.outOne can choose a set of events and list them on the command line as in
For large applications more details can be obtained running perf record that will produce a file containing all sampled events and their location in the application.
perf report can be used to display the detailed profile
a wrapper defining more user-friedly name for INTEL counters can be downloaded
cd;git clone https://github.com/andikleen/pmu-tools.gitin your home directory
and executed in place of perf as
~/pmu-tools/ocperf.pytry
~/pmu-tools/ocperf.py listto have a list of ALL available counters (and their meaning)
for an example see
doOCPerf
Excercise 1
Exchange the order of the loops in the matrix multiplication
Use matmul.cpp
Compile
c++62 -O2 -fopt-info-vec -march=nativeMeasure. What’s happening?
perf stat -d ./a.outRecompile with
-O3 (aggressive optimization and vectorization)
-Ofast (allow reordering of math operation)
Add -ffunroll-loops (force loop unrolling)
Change the product in a division
Excercise 2
Compare Horner Method with Estrin
Use PolyTest.cpp
compile, measure performance and eventually change compiler options as in Exercise 1
try also pipeline.cpp
Excercise 3
Branch predictor in OO code
Use Virtual.cpp
compile, measure performance and eventually change compiler options as in Exercise 1
Measure in various conditions * Remove “random_shuffle” * Increase number of Derived Classes * Try to change the order in the vector of pointers * Try to see if using an ad-hoc type identification makes a difference * Compare with a SOA * Try “AnyOf”
Excercise 4
Different form of “Braching” in conditional code
Use Branch.cpp
compile, measure performance and eventually change compiler options as in Exercise 1
Measure in various conditions * Remove “random_shuffle” * change the way the conditions are expressed