|
||
---|---|---|
|
||
KAI C++ improves the performance of certain programming styles. This tutorial explains what kinds of styles KAI C++ optimizes best, and in particular those styles that can make you a more productive programmer.
The most important part of performance is a good algorithm. The KAI C++ optimizer cannot turn a bad algorithm into a good one. However, it does enables you to write good algorithms more quickly by allowing you to use higher-level abstractions without incurring a performance penalty.
For example, here is a good algorithm for a common operation in linear-algebra written at a low level of abstraction.
void caxpy1( Complex y[], const Complex x[], Complex a, int n ) { for( int i=0; i<n; i++ ) { y[i].re += a.re * x[i].re - a.im * x[i].im; y[i].im += a.re * x[i].im + a.im * x[i].re; } }
What does it do? Quick inspection reveals that this algorithm
does the complex-vector computation y[i] += a*x[i]
.
Though a perfectly good algorithm, it tediously writes out complex
operations in terms of real arithmetic. If class Complex has overloaded
operators, the example can be written more simply as:
void caxpy2( Complex y[], const Complex x[], Complex a, int n ) { for( int i=0; i<n; i++ ) y[i] += a*x[i]; }
This higher-level version is simpler to understand. However, with typical C++ compilers, it may run more than 1.5 times slower than the low-level version. Consequently, programmers concerned about performance would be tempted to rewrite it as the lower-level version. We call such rewriting hand lowering. Hand lowering is fine only for a tiny piece of code, since it adds complexity. Not only is hand lowering tedious, it introduces more chances for error and breaks down programming abstractions.
With KAI C++, the second version should run as fast as the
first. The only restriction is that the definitions for operator+
and operator*
on class Complex
be inline
functions. In fact, the second version sometimes runs faster
on some machines, because the complex numbers are loaded and stored
into memory as objects instead of pieces.
There are even higher-level ways to write the example. Given
some kind of numeric-vector class Vector<T>
we might be able to hide the loop and just write:
void caxpy3( Vector<Complex>& y, Vector<Complex>& x, Complex a ) { y += a*x; }
KAI C++ makes this version run about 50% faster than it would
if you had used another C++ compiler, though it will still not
be as fast as the lower-level versions caxpy1
and
caxpy2
. KAI C++ uses fairly general principles to
optimize programs. It does not recognize any class as special.
The directory examples/caxpy/ contains complete source-code for some examples. Try compiling them with both your C++ compiler and with KAI C++. Then note the difference in execution times.
This section is a tutorial on what sorts of transformations KAI C++ does from the programmer's point of view. By improving the performance of C++ code, these transformations improve your performance as a programmer, because you use object-oriented techniques where performance issues might have precluded them in the past.
One of KAI C++'s great strengths is its ability to optimize lightweight objects. A lightweight object is one that is fairly small and so short-lived that in a non-OOP language, you would just write it out as a bunch of scalars. Examples of lightweight objects are smart pointers, complex numbers, array descriptors, and geometrical vectors.
KAI C++'s optimizer works best on lightweight objects when
all methods applied to the object are declared inline
.
This allows the KAI C++ optimizer to "see" in complete
detail what happens to the object.
KAI C++'s optimizer can inline just about anything. For example, it can handle functions that contain loops, switch statements, and gotos. Some other C++ compilers fail to inline functions that contain these constructs, or they generate inefficient code when inlining them.
Excessive inlining can cause code bloat. Therefore KAI C++ has four ``knobs'' that allow you to adjust inlining, which corresponding to the four ways that a function can be declared (or not declared) inline. The command line options for the knobs and their default values are summarized below.
--inline_keyword_space_time=75
inline
.
--inline_implicit_space_time=20
inline
.
--inline_generated_space_time=4.0
--inline_auto_space_time=0
inline
.
Here is an example:
class Trit { public: // Function defined inside class definition that is implicitly declared inline. // Controlled by --inline_implicit_space_time. Trit( bool b ) : value(b?YES:NO) {} // Function that is not declared inline. // Controlled by --inline_auto_space_time. Trit operator=( const Trit& t ); // Function inside class definition that is explicitly declared inline. // Controlled by --inline_keyword_space_time=(infinity). inline bool operator==( const Trit&t ) {return value==t.value;} // Function defined outside class definition that is explicitly declared inline. // Controlled by --inline_keyword_space_time=(infinity). inline Trit operator|=( const Trit& t ); // Destructor ~Trit is implicitly generated by compiler. // Controlled by --inline_generated_space_time. private: // Aside: values for rep_t form 3-point linear lattice under bitwise ops. enum rep_t {NO=0, MAYBE=1, YES=3} value; }; inline Trit Trit::operator|=( const Trit& t ) { value = (enum rep_t)(value | t.value); return *this; } Trit Trit::operator=( const Trit& t ) { value = t.value; return *this; }
The knob values are space-time parameters that indicate how much expansion in code you are willing to accept in trade for an increase in speed. The bigger the value, the more the potential for code bloat. An example of the command-line option is shown below.
--inline_keyword_space_time=2.0
Here, the space-time parameter is 2.0, which tells KAI C++ to inline functions explicitly declared inline when it estimates that inlining will cause an increase in code size that is not more than 2x the increase in code speed. E.g., if the increase in speed from inlining is estimated to be 5x at the call site, the function wll be inlined only if the corresponding increase in code size is no more than 10x.
0.0 |
reject all inlining |
---|---|
1.0 |
inline if code bloat does not exceed speed improvement |
10.0 |
inline allowing a reasonable amount of code bloat |
1000.0 |
inline practically everything |
The space and time estimates are specific to each call site, not for the whole program. That is, when KAI C++ is deciding whether to inline the function call, it looks at the space and time for that specific call only. Consequently, it is unlikely that inlining with a parameter value of 2.0 is really going to blow up the code space of your executable by 2x, because much of your code will be constructs other than function calls.
Older versions of KCC had only two knobs -- one for functions declared inline and one of functions declared as not inline. The problem with this was that many programmers did not realize that functions declared inside class definitions and compiler-generated functions are implicitly inline. These programmers were unaware of the code bloat they were (implicitly) requesting. In particular, old versions of GNU C++ required that all member functions for a template class be defined within the class. Furthermore, the compiler-generated functions were generating further surprise. For example, consider the following innocent looking class:
class Innocent { private: Guilty a, b, c, d; };
The destructor for this class is generated by the compiler, and at first glance, would appear to do nothing. But the destructor for class Innocent must call the destructors for members a, b, c, and d. These destructors may in turn call destructors for their members, and so forth. If at the innermost level the destructors are non-trivial, the code for the Innocent destructor can grow geometrically. Hence the four knobs. Their default values yield minimum surprise to inexperienced users, and experienced users can tune them. Additionally, experienced users are also free to explictly declared functions defined inside classes as inline.
To summarize, KAI C++ gives you separate control over inlining of functions depending upon whether they are declared inline, implicitly declared inline, generated by the compiler, or declared not inline. The controls allow you to trade code space against performance. The controls are used by KAI C++ when evaluating whether to inline a function at a specific call site. By default, almost all functions explicitly declared inline, small functions implicitly declared inline, and compiler-generated functions are inlined, and functions declared as not inline are never inlined.
KAI C++ has some remarkable abilities to simplify complicated branching. This reduces the penalty for two styles of programming.
The first style of interest is using boolean flags instead
of gotos or breaks. For instance, consider the following code
fragment that advances pointer s
to the first left
delimiter.
int found = 0; while( *s && !found ) { switch( *s ) { case '(': case '[': case '{': found = true; break; default: s++; break; } } if( found ) { ... }
The flag variable found
is used to break out of
the loop. An alternative would be to use a goto to jump directly
to the .... With most compilers, the flag variable method is slightly
slower, but frequently considered better style than using a goto.
The KAI C++ optimizer makes both versions equally fast.
Programmers perhaps use flag variables more than they realize, because inlining generates hidden flags. For example, inlining of functions that return boolean results sometimes create implicit flag variables. Such a function might look like this:
inline int Between( int a, int b, int c ) { if( a<b ) if( b<c ) return 1; return 0; }
Inlining of this function may cause the result to be assigned
to a hidden flag variable. (Do not try this example with Cfront!
Cfront quits compiling and issues a message: "Sorry, not
implemented".) KAI C++ removes the flag variable in most
cases.
KAI C++ can remove redundant comparisons. For example, it knows how to remove the redundant test in the following:
if( p!=NULL ) if( p!=NULL ) ...
Although programmers do not deliberately write redundant code, data abstractions sometimes introduce such redundancies. For example, consider the following fragment for a class String:
class String { private: char * my_data; public: int is_empty() const {return my_data==NULL;} char operator[]( int i ) const { if( my_data==NULL ) internal_error(); return my_data[i]; } };
Now suppose that the programmer inspects the ith character of string s by writing:
if( !s.is_empty() ) return s[i];
There are now two comparisons of the form my_data==NULL
one for evaluating s.is_empty()
and one hidden inside
operator[]
. After inlining, KAI C++ spots and removes
the redundant check.
KAI C++ understands comparisons against integer constants and
the NULL pointer very well. For example, it knows that x>3
implies x>2
It is smart enough to understand nuances
where machine arithmetic differs from conventional mathematics.
For instance, it knows that x>0
does not
imply x+1>0
because x
might be the
maximum allowed integer, and therefore x+1
might
"wrap around" and be the minimum allowed integer.
KAI C++ has some understanding of more complicated comparisons.
If you are depending upon branch simplification, try to always
write more complicated comparisons in the same consistent way.
For instance, if you write a test as x!=y+1
try to
always write it that way and not as x-1!=y
sometimes.
Constant folding refers to reducing constant expressions at compile time. Nearly all C++ compilers fold arithmetic expressions such as ``2+2'' at compile time. KAI C++ goes further by doing constant folding on the following transcendental functions for float, double, and long double at +K2 and higher:
One of the math header files (<math.h>, <cmath>
or <cmath>) must be included for these functions to be folded.
Function std::abs
is the draft-standard way to write
what was fabs
in C.
KAI C++ also does constant folding on complex numbers, namely the following classes at +K2 and higher: complex<float>, complex<double>, and complex<long double>. The complex functions folded are these:
KAI C++ also folds all complex arithmetic operators. So, for example, at +K2, KAI C++ optimizes this expression:
abs(sqrt(exp(complex<float>(0,1))))
to 1.0f at compilation time. Particularly complicated expressions may require +K3 to be used.
Partial constant folding is done too. Perhaps the most interesting case is the common practice of computing a point at angle alpha on the unit circle in the complex plane with the expression:
exp(complex<float>(0,alpha))
KAI C++ optimizes away the intermediate exponential calculation, leaving only the computation of sin and cos.