perf
is a tool to profile a process using Hardware Performance Counters
. Each counter counts
Events in the CPU such as cycles, executed instructions, load from a given level of the memory caches,
branches…
perf list
lists the event that are predefined inside the tool.
the easiest way to use perf is to profile the whole application (say ./a.out) using a default set of events
perf stat -d ./a.out
One can choose a set of events and list them on the command line as in
For large applications more details can be obtained running perf record
that will produce a file containing all sampled events and their location in the application.
perf report
can be used to display the detailed profile
a wrapper defining more user-friedly name for INTEL counters can be downloaded
cd;git clone https://github.com/andikleen/pmu-tools.git
in your home directory
and executed in place of perf
as
~/pmu-tools/ocperf.py
try
~/pmu-tools/ocperf.py list
to have a list of ALL available counters (and their meaning)
for an example see
doOCPerf
Excercise 1
Exchange the order of the loops in the matrix multiplication
Use matmul.cpp
Compile
c++ -O2 -fopt-info-vec -march=native
Measure. What’s happening?
perf stat -d ./a.out
Recompile with
-O3 (aggressive optimization and vectorization)
-Ofast (allow reordering of math operation)
Add -ffunroll-loops (force loop unrolling)
Change the product in a division
Excercise 2
Compare Horner Method with Estrin
Use PolyTest.cpp
compile, measure performance and eventually change compiler options as in Exercise 1
try also pipeline.cpp
Excercise 3
Branch predictor in OO code
Use Virtual.cpp
compile, measure performance and eventually change compiler options as in Exercise 1
Measure in various conditions
- Remove “random_shuffle”
- Increase number of Derived Classes
- Try to change the order in the vector of pointers
- Try to see if using an ad-hoc type identification makes a difference
- Compare with a SOA
- Try “AnyOf”
Excercise 4
Different form of “Braching” in conditional code
Use Branch.cpp
compile, measure performance and eventually change compiler options as in Exercise 1
Measure in various conditions
- Remove “random_shuffle”
- change the way the conditions are expressed