In this exercise you will work on a small test program,
word-list.cc,
which sorts a filtered list of unique words, then prints out middle N of
them. The program is inefficient, and even has bugs. You will use basic tools
such as top
, ps
, pmap
and pstack
, and profiling and debugging tools
valgrind
and igprof
, to understand, correct and optimise the program.
-
Make sure your environment is correct:
c++ -v 2>&1 | grep version # should say 'gcc version 4.9.0 (GCC)' valgrind --version # should say 'valgrind-3.10.0' igprof -h # should print simple help message
-
Go to the exercise directory:
cd esc15/hands-on/memory
-
Make yourself a test dataset of 16 random 30’000-word files:
perl -ne ' chomp; $W{$_}++; END { use List::Util "shuffle"; print map { "$_\n$_\n" } (shuffle keys %W)[0...99999] } ' < /usr/share/dict/words > words-random split -a 1 -l 25000 words-random words-part- head -12 words-random > words-short
-
Compile the program:
c++ -g -W -Wall -Werror -ansi -pedantic -o word-list word-list.cc -ldl -lpthread
-
Run the program with different arguments:
time ./word-list N 'gle' words-part-* time ./word-list N '[ceix]' words-part-* time ./word-list N '[badrx}' words-part-* time ./word-list N 'gle' words-short
-
While the program is running, use other terminal windows to examine it:
top ps wuf -C word-list pmap $(ps -o pid= -C word-list) pstack $(ps -o pid= -C word-list)
-
Check for memory access errors and leaks. Translate the diagnostics to the specific problems in the source code:
valgrind --num-callers=50 --leak-check=full \ ./word-list N 'gle' words-part-[abc] valgrind --num-callers=50 --leak-check=full \ ./word-list N '[ceix]' words-part-a valgrind --num-callers=50 --leak-check=full \ ./word-list N '[badrx}' words-part-[abc] valgrind --num-callers=50 --leak-check=full \ ./word-list N 'gle' words-short
-
Profile where the program spends it’s time. Again translate the profile information to specific issues in the source code:
igprof -o perf-simple.gz -d -pp ./word-list N 'gle' words-part-* igprof -o perf-pattern.gz -d -pp ./word-list N '[ceix]' words-part-* igprof-analyse -d -v -g perf-simple.gz > perf-simple.res 2>&1 igprof-analyse -d -v -g perf-pattern.gz > perf-pattern.res 2>&1 less -X perf-*.res
-
Profile program total memory allocations, leaks and heap snapshots. For one stage,
sort_words_uniq
, compare difference profile for pre and post heap snapshots:igprof -o mem-simple.gz -d -mp ./word-list Y 'gle' words-part-* igprof -o mem-pattern.gz -d -mp ./word-list Y '[ceix]' words-part-* igprof -o mem-bad.gz -d -mp ./word-list Y '[badrx}' words-part-* igprof-analyse -d -v -g -r MEM_TOTAL mem-simple.gz > mem-simple.res 2>&1 igprof-analyse -d -v -g -r MEM_TOTAL mem-pattern.gz > mem-pattern.res 2>&1 igprof-analyse -d -v -g -r MEM_LIVE mem-bad.gz > mem-bad.res 2>&1 less -X mem-*.res for f in igprof.*.gz; do igprof-analyse -d -v -g -r MEM_LIVE $f > $f.res 2>&1 done less -X igprof*.res for f in igprof.*.sort_words_uniq.pre.gz; do igprof-analyse -d -v -g -r MEM_LIVE \ -b $f $(echo $f | sed 's/\.pre\./.post./') \ > $f.res.sortdiff 2>&1 done less -X igprof*.res.sortdiff
-
Optimise the program and remove memory leaks from it as you best see fit. Cross-check your results always by running:
grep -iv <rx> <file> | uniq | wc -l
-
Submit as your final result the output from the command (step #5), valgrind messages from step #7, overall timing comparison with grep/uniq (% faster/slower), output of
head -25 perf-*.res mem-*.res
and how you might further improve performance or quality, or if why that would no longer be possible or make sense.
References
- IgProf: http://igprof.org
- Valgrind: http://valgrind.org