High Performance C++

Table of Contents	Introduction	Getting Up and Running
	Debugging	New C++ Features
	External Interface	Tutorials
	Man Page	FAQ

KAI C++ improves the performance of certain programming styles. This tutorial explains what kinds of styles KAI C++ optimizes best, and in particular those styles that can make you a more productive programmer.

3.1 Using Higher-Level Abstractions

The most important part of performance is a good algorithm. The KAI C++ optimizer cannot turn a bad algorithm into a good one. However, it does enables you to write good algorithms more quickly by allowing you to use higher-level abstractions without incurring a performance penalty.

For example, here is a good algorithm for a common operation in linear-algebra written at a low level of abstraction.

    void caxpy1( Complex y[], const Complex x[],
                                         Complex a, int n )
    {
       for( int i=0; i<n; i++ ) {
          y[i].re += a.re * x[i].re - a.im * x[i].im;
                 y[i].im += a.re * x[i].im + a.im * x[i].re;
               }
    }

What does it do? Quick inspection reveals that this algorithm does the complex-vector computation y[i] += a*x[i]. Though a perfectly good algorithm, it tediously writes out complex operations in terms of real arithmetic. If class Complex has overloaded operators, the example can be written more simply as:

    void caxpy2( Complex y[], const Complex x[],
                 Complex a, int n ) 
    {
        for( int i=0; i<n; i++ ) y[i] += a*x[i];
    }

This higher-level version is simpler to understand. However, with typical C++ compilers, it may run more than 1.5 times slower than the low-level version. Consequently, programmers concerned about performance would be tempted to rewrite it as the lower-level version. We call such rewriting hand lowering. Hand lowering is fine only for a tiny piece of code, since it adds complexity. Not only is hand lowering tedious, it introduces more chances for error and breaks down programming abstractions.

With KAI C++, the second version should run as fast as the first. The only restriction is that the definitions for operator+ and operator* on class Complex be inline functions. In fact, the second version sometimes runs faster on some machines, because the complex numbers are loaded and stored into memory as objects instead of pieces.

There are even higher-level ways to write the example. Given some kind of numeric-vector class Vector<T> we might be able to hide the loop and just write:

    void caxpy3( Vector<Complex>& y,
                 Vector<Complex>& x, Complex a ) 
    {
        y += a*x;
    }

KAI C++ makes this version run about 50% faster than it would if you had used another C++ compiler, though it will still not be as fast as the lower-level versions caxpy1 and caxpy2. KAI C++ uses fairly general principles to optimize programs. It does not recognize any class as special.

The directory examples/caxpy/ contains complete source-code for some examples. Try compiling them with both your C++ compiler and with KAI C++. Then note the difference in execution times.

3.2 KAI C++ Transformations

This section is a tutorial on what sorts of transformations KAI C++ does from the programmer's point of view. By improving the performance of C++ code, these transformations improve your performance as a programmer, because you use object-oriented techniques where performance issues might have precluded them in the past.

3.2.1 Lightweight Objects

One of KAI C++'s great strengths is its ability to optimize lightweight objects. A lightweight object is one that is fairly small and so short-lived that in a non-OOP language, you would just write it out as a bunch of scalars. Examples of lightweight objects are smart pointers, complex numbers, array descriptors, and geometrical vectors.

3.2.2 Inlining

KAI C++'s optimizer works best on lightweight objects when all methods applied to the object are declared inline. This allows the KAI C++ optimizer to "see" in complete detail what happens to the object.

KAI C++'s optimizer can inline just about anything. For example, it can handle functions that contain loops, switch statements, and gotos. Some other C++ compilers fail to inline functions that contain these constructs, or they generate inefficient code when inlining them.

Excessive inlining can cause code bloat. Therefore KAI C++ has four ``knobs'' that allow you to adjust inlining, which corresponding to the four ways that a function can be declared (or not declared) inline. The command line options for the knobs and their default values are summarized below.

--inline_keyword_space_time=75: Affects functions explicitly declared inline with the keyword inline.
--inline_implicit_space_time=20: Affects functions implicitly declared inline because their definition appears inside a class definition but are not otherwise marked with the keyword inline.
--inline_generated_space_time=4.0: Affects functions created by the compiler when there is no user declaration for such. E.g., compiler-generated default constructor, copy constructor, destructor, or assignment.
--inline_auto_space_time=0: Affects all other functions -- those functions explicitly defined by the programmer outside of a class definition that are not marked with the keyword inline.

Here is an example:

    class Trit {
    public:
	// Function defined inside class definition that is implicitly declared inline.
	// Controlled by --inline_implicit_space_time.
	Trit( bool b ) : value(b?YES:NO) {}

	// Function that is not declared inline.	
	// Controlled by --inline_auto_space_time.
	Trit operator=( const Trit& t );

	// Function inside class definition that is explicitly declared inline.
	// Controlled by --inline_keyword_space_time=(infinity).
	inline bool operator==( const Trit&t ) {return value==t.value;}

	// Function defined outside class definition that is explicitly declared inline.
	// Controlled by --inline_keyword_space_time=(infinity).
	inline Trit operator|=( const Trit& t );

	// Destructor ~Trit is implicitly generated by compiler.
	// Controlled by --inline_generated_space_time.

    private:
	// Aside: values for rep_t form 3-point linear lattice under bitwise ops.
	enum rep_t {NO=0, MAYBE=1, YES=3} value;
    };

    inline Trit Trit::operator|=( const Trit& t ) {
	value = (enum rep_t)(value | t.value);
	return *this;
    }

    Trit Trit::operator=( const Trit& t ) {
	value = t.value;
	return *this;
    }

The knob values are space-time parameters that indicate how much expansion in code you are willing to accept in trade for an increase in speed. The bigger the value, the more the potential for code bloat. An example of the command-line option is shown below.

--inline_keyword_space_time=2.0

Here, the space-time parameter is 2.0, which tells KAI C++ to inline functions explicitly declared inline when it estimates that inlining will cause an increase in code size that is not more than 2x the increase in code speed. E.g., if the increase in speed from inlining is estimated to be 5x at the call site, the function wll be inlined only if the corresponding increase in code size is no more than 10x.

0.0	reject all inlining
1.0	inline if code bloat does not exceed speed improvement
10.0	inline allowing a reasonable amount of code bloat
1000.0	inline practically everything

The space and time estimates are specific to each call site, not for the whole program. That is, when KAI C++ is deciding whether to inline the function call, it looks at the space and time for that specific call only. Consequently, it is unlikely that inlining with a parameter value of 2.0 is really going to blow up the code space of your executable by 2x, because much of your code will be constructs other than function calls.

Older versions of KCC had only two knobs -- one for functions declared inline and one of functions declared as not inline. The problem with this was that many programmers did not realize that functions declared inside class definitions and compiler-generated functions are implicitly inline. These programmers were unaware of the code bloat they were (implicitly) requesting. In particular, old versions of GNU C++ required that all member functions for a template class be defined within the class. Furthermore, the compiler-generated functions were generating further surprise. For example, consider the following innocent looking class:

	class Innocent {
	private:
	    Guilty a, b, c, d;
	};

The destructor for this class is generated by the compiler, and at first glance, would appear to do nothing. But the destructor for class Innocent must call the destructors for members a, b, c, and d. These destructors may in turn call destructors for their members, and so forth. If at the innermost level the destructors are non-trivial, the code for the Innocent destructor can grow geometrically. Hence the four knobs. Their default values yield minimum surprise to inexperienced users, and experienced users can tune them. Additionally, experienced users are also free to explictly declared functions defined inside classes as inline.

To summarize, KAI C++ gives you separate control over inlining of functions depending upon whether they are declared inline, implicitly declared inline, generated by the compiler, or declared not inline. The controls allow you to trade code space against performance. The controls are used by KAI C++ when evaluating whether to inline a function at a specific call site. By default, almost all functions explicitly declared inline, small functions implicitly declared inline, and compiler-generated functions are inlined, and functions declared as not inline are never inlined.

3.2.3 Branch Simplification

KAI C++ has some remarkable abilities to simplify complicated branching. This reduces the penalty for two styles of programming.

Flags for Structured Control Flow

The first style of interest is using boolean flags instead of gotos or breaks. For instance, consider the following code fragment that advances pointer s to the first left delimiter.

int found = 0;
while( *s && !found ) {
    switch( *s ) {
	case '(':
	case '[':
	case '{':
	    found = true;
	    break;
	default:
	    s++;
	    break;
    }
}
if( found ) {
    ...
}

The flag variable found is used to break out of the loop. An alternative would be to use a goto to jump directly to the .... With most compilers, the flag variable method is slightly slower, but frequently considered better style than using a goto. The KAI C++ optimizer makes both versions equally fast.

Programmers perhaps use flag variables more than they realize, because inlining generates hidden flags. For example, inlining of functions that return boolean results sometimes create implicit flag variables. Such a function might look like this:

inline int Between( int a, int b, int c ) {
   if( a<b )
      if( b<c )
         return 1;
      return 0;
}

Inlining of this function may cause the result to be assigned to a hidden flag variable. (Do not try this example with Cfront! Cfront quits compiling and issues a message: "Sorry, not implemented".) KAI C++ removes the flag variable in most cases.

Redundant Tests

KAI C++ can remove redundant comparisons. For example, it knows how to remove the redundant test in the following:

if( p!=NULL )
   if( p!=NULL )
      ...

Although programmers do not deliberately write redundant code, data abstractions sometimes introduce such redundancies. For example, consider the following fragment for a class String:

class String {
private:
   char * my_data;
public:
   int is_empty() const {return my_data==NULL;}
   char operator[]( int i ) const {
      if( my_data==NULL )
         internal_error();
      return my_data[i];
   }
};

Now suppose that the programmer inspects the ith character of string s by writing:

if( !s.is_empty() )
    return s[i];

There are now two comparisons of the form my_data==NULL one for evaluating s.is_empty() and one hidden inside operator[]. After inlining, KAI C++ spots and removes the redundant check.

KAI C++ understands comparisons against integer constants and the NULL pointer very well. For example, it knows that x>3 implies x>2 It is smart enough to understand nuances where machine arithmetic differs from conventional mathematics. For instance, it knows that x>0 does not imply x+1>0 because x might be the maximum allowed integer, and therefore x+1 might "wrap around" and be the minimum allowed integer.

KAI C++ has some understanding of more complicated comparisons. If you are depending upon branch simplification, try to always write more complicated comparisons in the same consistent way. For instance, if you write a test as x!=y+1 try to always write it that way and not as x-1!=y sometimes.

3.2.4 Constant Folding

Constant folding refers to reducing constant expressions at compile time. Nearly all C++ compilers fold arithmetic expressions such as ``2+2'' at compile time. KAI C++ goes further by doing constant folding on the following transcendental functions for float, double, and long double at +K2 and higher:

fabs
std::abs
sqrt
exp
sin
cos

One of the math header files (<math.h>, <cmath> or <cmath>) must be included for these functions to be folded. Function std::abs is the draft-standard way to write what was fabs in C.

KAI C++ also does constant folding on complex numbers, namely the following classes at +K2 and higher: complex<float>, complex<double>, and complex<long double>. The complex functions folded are these:

real
imag
norm
conj
polar
abs
exp
sqrt

KAI C++ also folds all complex arithmetic operators. So, for example, at +K2, KAI C++ optimizes this expression:

    abs(sqrt(exp(complex<float>(0,1))))

to 1.0f at compilation time. Particularly complicated expressions may require +K3 to be used.

Partial constant folding is done too. Perhaps the most interesting case is the common practice of computing a point at angle alpha on the unit circle in the complex plane with the expression:

    exp(complex<float>(0,alpha))

KAI C++ optimizes away the intermediate exponential calculation, leaving only the computation of sin and cos.

E-Mail KAI Technical Support E-Mail KAI Contact KAI

This file last updated on 4 June 1999.