Helping the Optimizer

Table of Contents	Introduction	Getting Up and Running
	Debugging	New C++ Features
	External Interface	Tutorials
	Man Page	FAQ

This chapter describes how you can help KAI C++ better optimize your program. Depending upon your code and optimization concerns, your effort can range from simply using the default settings of KAI C++ to adding extra keywords to performance-critical parts of your programs.

4.1 Writing Optimizable Code

Poor optimizers of the past have lead C programmers to believe that clear, readable code leads to poor performance. The opposite is true with most modern optimizers, and especially true of KAI C++. The general rule is: The easier it is for you to understand, the easier it is for KAI C++ to optimize. In general, the following constructs confuse both people and the KAI C++ optimizer.

Casts that result in "undefined behavior" according to the C++ standard. For instance, casting a "pointer to struct" to a "pointer to int" and accessing the structure members (whatever their type) with the "pointer to int" is going to make the KAI C++ optimizer punt. If you are trying to write machine-independent C++, you should avoid these sorts of casts anyway.
Modifying const objects (by casting away constness).
Gotos that jump from outside of a loop into a loop.

Although these constructs may cause KAI C++ to do less optimization, or to produce different "undefined behavior", they never cause KAI C++ to generate incorrect code.

4.2 Inline Functions

In general, declaring small functions as inline helps the KAI C++ optimizer, because the optimizer can see exactly what will happen when the function is evaluated at a particular call site. When the KAI C++ optimizer encounters a call to a function that is not inlined, it must make very conservative assumptions. Any pointer or reference parameters are assumed to be "captured" by the call and possibly used sometime later to inspect or modify their targets. All global variables or variables accessible to captured pointers are presumed to be both inspected and modified. Pointers are also considered to be captured if they are assigned to structures. Note that non-static member functions implicitly pass the pointer this.

If you want KAI C++ to do a good job of optimizing a local lightweight object, you should avoid passing pointers or references to it (or its members) as arguments to non-inline routines. The surest way is to simply mark all functions as inline that will be called on the object, or always pass the object (or its members) by value. If some of the functions seem too large to make inline then in all probability the object is a heavyweight anyway.

For example, if you have a class Complex for complex numbers, you probably want to make at least the constructor, destructor, and common arithmetic operations as inline functions. That way, KAI C++ can optimize common arithmetic on complex numbers without difficulty. On the other hand, a function like "hyperbolic arctangent" is probably so slow anyway for complex numbers that it is not worth making inline.

Compiler-generated functions are implicitly inline. So, for instance, if no copy-constructor is defined for class Complex, then the compiler-generated definition works just fine.

4.3 Aliasing

Aliasing means using two different names to refer to the same storage location. Aliasing (or a conservative assumption that there might be aliasing) sometimes limits the way KAI C++ optimizes your program. Though programs make infrequent use of the sort of aliasing that inhibits optimization, the KAI C++ optimizer must presume the worst case to avoid generating incorrect results. The rest of this section explains aliasing issues in detail and how you can improve the way KAI C++ deals with aliases.

4.3.1 Aliasing in FORTRAN vs. C++

Although this is a tutorial on C++, it is good to discuss FORTRAN here, since FORTRAN and C++ differ dramatically in issues of aliasing. Many programmers believe that the FORTRAN language inherently delivers higher performance than C++. The FORTRAN mystique arises for many reasons. Programmers will cite advanced FORTRAN implementations, its rules for arithmetic evaluation, and rumor. Most of the cited reasons have minor or nonexistent impact on performance. In fact, in some respects the KAI C++ optimizer advances the state-of-the-art well past the best FORTRAN compilers.

Unfortunately, there remains one major reason why FORTRAN can deliver higher performance than C++.

No two names in a given FORTRAN subroutine may refer to the same storage location unless explicitly specified by an EQUIVALENCE statement.

In contrast, C and C++ are quite liberal about aliases. Addresses of variables can be captured as pointers and references (references are just another term for pointers), and pointers may be freely assigned to other pointers. Unless there is evidence to the contrary, KAI C++ must presume that just about any two pointers can point to the same location, and that any pointer can point to any variable that might have had its address captured.

Aliases greatly restrict potential optimizations. For example, consider the following code that sets each element of array y to 1.0f-*a.

    void set_opposite( float y[],
                       const float * a, 
                       int n ) {
        for( int i=0; i<n; i++ )
            y[i] = 1.0f - *a;
    }

The author might expect that *a will remain invariant during execution of the loop. If this were always so, the KAI C++ optimizer could remove n-1 subtractions by hoisting the subtraction out of the loop, in effect rewriting the code to look like this.

    void faster_set_opposite( float y[],
                              const float * a, 
                              int n ) {
        float temp = 1.0f - *a;
        for( int i=0; i<n; i++ )
            y[i] = temp;
    }

However, in C and C++, it is legal to call function set_opposite as follows:

    float y[100];
    y[50] = 1.0;
    set_opposite( y, &y[50], 100 );

What happens when set_opposite is called here? The first 50 iterations set a[0]...a[49] to 0.0 The 51st iteration changes a[50] from 1.0 to 0.0. Since *a is aliased to y[50] the subsequent iterations do something different: they set a[51]...[99] to 1.0. This use of aliasing is tricky to understand, and so is a poor coding practice. Nonetheless it is legal C++, so the KAI C++ optimizer cannot hoist the subtraction out of the loop. Situations like this will limit optimization when you use pointer indirection and reference types unnecessarily.

It is not that pointers are "bad", in fact they are a powerful tool for defining complex dynamic data structures. The same structures are awkward (or even painful) to define in FORTRAN. It is just that in return for the expressiveness of C++, you may give up performance in some cases. By understanding aliasing issues, you can often write your code in a way that makes clear that no aliasing is present. In most cases, removing the potential for aliasing clarifies your intended algorithm, speeds it up, and makes it less likely to harbor obscure bugs. For instance, writing the example function set_opposite so that a is passed by value instead of via a pointer removes any possibility of aliasing.

    void set_opposite( float y[], 
                       const float a, 
                       int n ) {
        for( int i=0; i<n; i++ )
            y[i] = 1.0f - a;
    }

A general rule is that scalar and small structure inputs to a function should pass by value, not by reference or via pointer. This rule is not only good for the KAI C++ optimizer, it is usually cited as a good programming practice because it makes it two things clear: 1) the argument is input-only (also true of const float *a) and 2) the argument is not merely a path to a modifiable location.

4.3.2 Abstract Float/Pointer

Though the C++ standard permits many forms of aliasing, some of those forms are useless in real programs. KAI C++ allows you to promise not to use certain forms of aliasing, and in return it may optimize better.

The specifics are listed below.

Option Default Command-line

abstract float on --abstract_float

abstract pointer off --abstract_pointer

Option	Default	Command-line
abstract float	on	--abstract_float
abstract pointer	off	--abstract_pointer

The default settings allow almost all C++ programs to run correctly. If the program you are compiling is truly machine-independent, you should turn on both abstract float and abstract pointer to obtain best optimization.

The rest of this section explains exactly what you are promising when you turn on the options. If your programs do not depend upon details of machine architecture, you can simply turn on both options and skip reading the rest of this section.

When you compile with abstract float on, you are promising that your program does not depend upon storing a floating-point number into memory and reading it back as non-floating type via an alias, or vice-versa. You are promising to treat floating-point as abstract data types wherever aliasing is involved.

When you compile with abstract pointer on, you are promising a similar thing for pointers. Specifically, you are promising that your program does not depend upon storing a pointer into memory and reading it back as a non-pointer type via an alias. In particular, you are promising not to depend upon aliasing of pointers and integers. Even with this option set, you are still allowed to store a pointer of one type and read it back as a pointer of another type.

Neither option prohibits the practice (indispensable to some programmers) of storing objects as one type and loading them back as another type. They simply prohibit doing so via aliases.

4.3.3 Abstract Float Example

For example, the following code might not be safe to compile with abstract float.

    float f;
    int * i = (int*)&f;
    f = 3.0;
    int ieee_exponent = (*i) >> 23 & 0xFF

The code stores the value 3.0 into f but loads it via the alias *i. Obviously, this code may not be portable because it depends on the particular bit-pattern of a floating-point number.

It is possible to inspect floating-point bit-patterns in a way that is compatible with abstract float. You just have to use unions instead of aliasing. For example, the following code is safe to compile with abstract float.

    union {
        float f;
        int i;
    } convert;
    convert.f = 3.0;
    int ieee_exponent = convert.i >> 23 & 0xFF;

KAI C++ knows that union members overlap in memory, but does not consider them to be aliased. Consequently, the last example works correctly even with abstract float.

The abstract float option does not concern aliases when there is no attempt to "communicate" though the alias. For example, the following code employs aliasing to store and load both a floating-point value and an integer value in a memory location at different times.

    float f;
    int *i = (int*) &f;
    f = 3.0;
    printf( "%g", f );
    *i = 2;
    printf( "%d", *i );

The example is safe to compile with abstract float turned on since values are always loaded as the same type as they were written.

4.3.4 Restrict Keyword

This section is for advanced programmers who understand the optimization implications of aliasing.

KAI C++ provides a keyword restrict that allows programmers to promise that a pointer or reference has no aliases. By default, the keyword is rejected by KAI C++ because it is not part of standard C and C++. (The keyword restrict is also recognized by Cray and Convex compilers, and is being proposed as part of the C-9x standard.) To enable recognition of restrict, add the following command-line option:

--restrict

For instance, the example function set_opposite from section 3.3.1 might be rewritten with restrict. Here is one way to rewrite it.

    void set_opposite( float * restrict y,
                       const float * a, 
                       int n ) {
        for( int i=0; i<n; i++ )
            y[i] = 1.0f - *a;
    }

Notice the position of the keyword restrict.

It is a type specifier that qualifies a type, similar to the way that const or volatile qualify a type.
It qualifies the pointer y, not the target of y. In the example, the keyword restrict says that there are no secret aliases for the target of y. The type qualified must be a pointer. For instance, do not declare y like this:
restrict float * n /* WRONG! */
This declaration mistakenly qualifies the float pointed to by y, not the pointer y. KAI C++ produces an error message for such illegal uses of restrict. In this way, it differs from const and volatile, which can qualify any type. The reason that restrict is different is that it is describing access to an object, not properties of the object itself.

It is crucial to understand restrict if you use it, because it is essentially a contract between you and KAI C++. The contract for pointer y qualified as restrict, is:

For the lifetime of pointer y, only y or pointers copied directly or indirectly from y will be used to reference the sub-object pointed to by y.

There are four important constraints in the contract:

"for the lifetime of pointer y" -- This means that the restriction lasts only between the time the pointer is created and destroyed. In the example above, the pointer is a formal parameter, which means that the restriction lasts only over the invocation of function set_opposite.
"pointers copied directly or indirectly" -- Tell the compiler to presume that pointers created before the restricted pointer are not used to access the target of the restricted pointer. It is okay to use copies of a restricted pointer as aliases. What is not allowed is aliases that did not "originate" as copies of the restricted pointer.
"used to reference" -- This says that aliases of a restricted pointer are okay if they are not used to inspect or modify the target of y (for the lifetime of y).
"sub-object pointed to by y" -- This says that you can use multiple restricted pointers to point to different parts of the same object. The restriction is only that you cannot use aliases to refer to the same part of an object.

Constraint #1 allows nested scoping for restricted pointers. Restricted pointers need not be parameters. Thus constraint #1 permits our example to be written like this:

    void set_opposite( float * restrict y,
                       const float * a, 
                       int n ) {
      {   float * restrict yeven = y; // Begin lifetime of yeven
          for( int i=0; i<n; i+=2 )
              yeven[i] = 1.0f - *a;
      }                               // End lifetime of yeven
      {   float * restrict
          yodd = y;                   // Begin lifetime of yodd
          for( int i=1; i<n; i+=2 )
              yodd[i] = 1.0f - *a;
      }                               // End lifetime of yodd
    }

For the lifetime of yeven it is the only pointer used to access the array, and likewise for yodd.

Here is perhaps a more practical application of constraint #1. It shows how to rewrite our example to gain the benefit of restricted pointers in the implementation without making them part of the interface.

    void set_opposite( float y[],
    const float * a, int n ) {
    float * restrict yr = y;
    for( int i=0; i<n; i++ )
        yr[i] = 1.0f - *a;
    }

There are arguments in favor and against making restricted pointers parts of interfaces. An argument in favor is that the client of the interface knows not to pass aliased pointers to the interface. An argument against is that restrict is not part of standard C++, and you may be writing a non-standard C++ implementation of an interface written in standard C++. For example, the standard C++ library function memcpy does not specify restricted pointers as part of its interface. However, passing overlapping objects to it is disallowed by the standard. Thus implementations are allowed to declare is as:

    void memcpy( void * restrict dst, 
                 void * src,
                 size_t n );

We leave the choice on whether to use restricted pointers in interfaces up to the programmer.

The fact that restricted pointers can be scoped within local blocks lets us go further. Suppose that function set_opposite is supposed to be able to handle aliased arguments. Then we could write the "two version" loops below:

    void set_opposite( float y[], 
                       const float * a,
                       int n ) {
        if( y <= a && a<y+n ) {
            // *a aliased to member of y. Use unrestricted pointer.
            for( int i=0; i<n; i++ )
                y[i] = 1.0f - *a;
        } else {
            // Use restricted pointer so
            // compiler can optimize loop.
            float * restrict yr = y;
                             for( int i=0; i<n; i++ )
                                        yr[i] = 1.0f - *a;
        }
    }

Of course, in such a simple example, it would be easier to write a "one version" loop and hoist the invariant expression by hand. However, in real problems the invariant expression may be hidden by an inline function or template, making hand-optimization impractical. The example merely demonstrates the flexibility of the restrict qualifier. Once you understand the flexibility, you should be able to make it suit your own purposes.

Constraint #2 allows restricted pointers to be copied. The copy of the pointer may be restricted or unrestricted. Below is a variation on set_opposite that copies a restricted pointer to an unrestricted pointer.

    void set_opposite( float * restrict y,
                                                   const float * a, 
                       int n ) {
        float * z = y;
        for( int i=0; i<n; i++ )
            *z++ = 1.0f - *a;
    }

KAI C++ knows how to track the pointer assignments and consequently optimizes this version too.

Constraint #3 says that a pointer that is unused (or used for pointer arithmetic only) is harmless.

Constraint #4 allows different parts of an object to be accessed by different restricted pointers (or even some unrestricted pointers), as demonstrated by the code below.

    void set_opposite( float y[],
                       const float * a, 
                       int n ) {
        float * restrict ylo = y;
        // Get pointer to low half of y
        float * restrict yhi = y + n/2;
        // Get pointer to high half of y
        for( int i=0; i<n/2; i+=2 ) {
            ylo[i] = 1.0f - *a;
            yhi[i] = 1.0f - *a;
        }
        if( n/2+n/2 < n ) {
            // Was an odd element left out?
            y[n-1] = 1.0f - *a;
        }
    }

In this version, two restricted pointers are used at the same time, but they point to different parts of the array y. Furthermore, a third pointer (y itself!) is used during the lifetime of the restricted pointers. This use is okay because it accesses the part of array y not accessed by either of the restricted pointers.

The keyword restrict also applies to pointers that are targets of pointers, in the same manner that const and volatile keywords apply to targets of pointers. When more than one level of indirection is involved, it is usually best to put restrict at all levels. For example, a common method of representing matrices to keep an array of pointers to the rows of the matrix. Below is a routine that multiplies two such matrices a and b and stores the result in matrix c.

    void matmul( float * restrict * restrict c,
                 float ** a, 
                 float ** b, 
                 int n ) {
        for( int i=0; i<n; i++ )
            for( int j=0; j<n; j++ ) {
                float sum = 0.0f;
                for( int k=0; k<n; k++ )
                    sum += a[i][k]*b[k][j];
                c[i][j] = sum;
            }
    }

Omitting either restrict qualifier changes the meaning of the declaration. If the first restrict is omitted, the declaration becomes "float ** restrict c", which says that c points to an array of pointers, and the only access to that array is via c. The declaration allows some aliasing, because some or all of the pointers in the array might be identical, or might be the same as pointers in the arrays for a or b. If the second restrict is omitted, the declaration becomes "float * restrict * c", which says that c points to an array of pointers, and that the pointers in the array are the only access to their targets. The declaration allows some aliasing, because it permits pointers a and b to point to the same array of pointers.

Notice that in function matmul, only the pointer c for the output is restricted. The input pointers are plain unrestricted pointers. In general, it is best to mark output pointers (and not input pointers) as restricted, and (if applicable) to mark all levels of indirection as restrict. The reason is that aliasing of inputs does not affect optimization. It is only aliasing of an output with an input or another output that degrades optimization. Furthermore, it is sometimes desirable to alias inputs. The matrix-multiply written above can be used to square a matrix by passing in the same pointer for arguments a and b.

The restrict keyword can qualify references and array parameters. The reference syntax is similar to the pointer syntax, except that there is a & instead of a *. The array syntax is surprising -- the qualifier goes inside the brackets. Here is the earlier example rewritten with the array syntax.

    void set_opposite( float y[restrict],
                       const float * a, 
                       int n ) {
        for( int i=0; i<n; i++ )
            y[i] = 1.0f - *a;
    }

For syntactic consistency, when recognition of restrict is turned on, the const and volatile qualifiers can also appear inside the brackets.

The pointer this for methods can be qualified with restrict. Below is an example.

    class Foo {
    public:
        int sum;
        void add( int& a ) restrict {
            sum += a;
        }
    };

To qualify this, the keyword restrict is placed where the const and volatile qualifiers would be. Please note the subtle difference between restrict and the other qualifiers when used this way. The keyword restrict technically qualifies the pointer this, whereas the keywords const or volatile used this way qualify the target of this. Of course, qualifying a pointer as restrict says that the object has no aliases (for the lifetime of the method invocation).

The keyword restrict is designed so that a compiler can choose to ignore it. If you use restrict in your code, you can compile it with compilers that do not understand it simply by defining it away as shown below.

    #define restrict /**/

Unlike const and volatile, the restrict keyword is not used for overload resolution. Thus when used legally, removing it never causes a program to generate different results. (When restrict is used illegally to lie about aliasing, removing it may cause a program to generate different results. Of course, the same can be said about lying with const.)

To summarize, the keyword restrict is a way to specify that a pointer, reference, or array parameter has no aliases. It is for advanced users who understand the optimization implications of aliasing. Our rules of thumb for using restrict are listed below.

Use restrict sparingly.
Use restrict on output pointers, not input pointers
Apply restrict to all levels of indirection.
Use #define to remove restrict when the compiler does not support it.
Do not use restrict when aliasing is expected.

E-Mail KAI Technical Support E-Mail KAI Contact KAI

This file last updated on 4 June 1999.