|
||
---|---|---|
|
||
This chapter describes how you can help KAI C++ better optimize your program. Depending upon your code and optimization concerns, your effort can range from simply using the default settings of KAI C++ to adding extra keywords to performance-critical parts of your programs.
Poor optimizers of the past have lead C programmers to believe that clear, readable code leads to poor performance. The opposite is true with most modern optimizers, and especially true of KAI C++. The general rule is: The easier it is for you to understand, the easier it is for KAI C++ to optimize. In general, the following constructs confuse both people and the KAI C++ optimizer.
const
objects (by casting away constness).
Although these constructs may cause KAI C++ to do less optimization, or to produce different "undefined behavior", they never cause KAI C++ to generate incorrect code.
In general, declaring small functions as inline helps the KAI
C++ optimizer, because the optimizer can see exactly what will
happen when the function is evaluated at a particular call site.
When the KAI C++ optimizer encounters a call to a function that
is not inlined, it must make very conservative assumptions. Any
pointer or reference parameters are assumed to be "captured"
by the call and possibly used sometime later to inspect or modify
their targets. All global variables or variables accessible to
captured pointers are presumed to be both inspected and modified.
Pointers are also considered to be captured if they are assigned
to structures. Note that non-static member functions implicitly
pass the pointer this
.
If you want KAI C++ to do a good job of optimizing a local
lightweight object, you should avoid passing pointers or references
to it (or its members) as arguments to non-inline routines. The
surest way is to simply mark all functions as inline
that will be called on the object, or always pass the object (or
its members) by value. If some of the functions seem too large
to make inline
then in all probability the object
is a heavyweight anyway.
For example, if you have a class Complex for complex numbers, you probably want to make at least the constructor, destructor, and common arithmetic operations as inline functions. That way, KAI C++ can optimize common arithmetic on complex numbers without difficulty. On the other hand, a function like "hyperbolic arctangent" is probably so slow anyway for complex numbers that it is not worth making inline.
Compiler-generated functions are implicitly inline. So, for
instance, if no copy-constructor is defined for class Complex
,
then the compiler-generated definition works just fine.
Aliasing means using two different names to refer to the same storage location. Aliasing (or a conservative assumption that there might be aliasing) sometimes limits the way KAI C++ optimizes your program. Though programs make infrequent use of the sort of aliasing that inhibits optimization, the KAI C++ optimizer must presume the worst case to avoid generating incorrect results. The rest of this section explains aliasing issues in detail and how you can improve the way KAI C++ deals with aliases.
Although this is a tutorial on C++, it is good to discuss FORTRAN here, since FORTRAN and C++ differ dramatically in issues of aliasing. Many programmers believe that the FORTRAN language inherently delivers higher performance than C++. The FORTRAN mystique arises for many reasons. Programmers will cite advanced FORTRAN implementations, its rules for arithmetic evaluation, and rumor. Most of the cited reasons have minor or nonexistent impact on performance. In fact, in some respects the KAI C++ optimizer advances the state-of-the-art well past the best FORTRAN compilers.
Unfortunately, there remains one major reason why FORTRAN can deliver higher performance than C++.
No two names in a given FORTRAN subroutine may refer to the same storage location unless explicitly specified by an EQUIVALENCE statement.
In contrast, C and C++ are quite liberal about aliases. Addresses of variables can be captured as pointers and references (references are just another term for pointers), and pointers may be freely assigned to other pointers. Unless there is evidence to the contrary, KAI C++ must presume that just about any two pointers can point to the same location, and that any pointer can point to any variable that might have had its address captured.
Aliases greatly restrict potential optimizations.
For example, consider the following code that sets each element
of array y
to 1.0f-*a
.
void set_opposite( float y[], const float * a, int n ) { for( int i=0; i<n; i++ ) y[i] = 1.0f - *a; }
The author might expect that *a
will remain invariant
during execution of the loop. If this were always so, the KAI
C++ optimizer could remove n-1
subtractions by hoisting
the subtraction out of the loop, in effect rewriting the code
to look like this.
void faster_set_opposite( float y[], const float * a, int n ) { float temp = 1.0f - *a; for( int i=0; i<n; i++ ) y[i] = temp; }
However, in C and C++, it is legal to call function set_opposite
as follows:
float y[100]; y[50] = 1.0; set_opposite( y, &y[50], 100 );
What happens when set_opposite
is called here?
The first 50 iterations set a[0]...a[49]
to 0.0
The 51st iteration changes a[50]
from 1.0
to
0.0
. Since *a
is aliased to y[50]
the subsequent iterations do something different: they set a[51]...[99]
to 1.0
. This use of aliasing is tricky to understand,
and so is a poor coding practice. Nonetheless it is legal C++,
so the KAI C++ optimizer cannot hoist the subtraction out of the
loop. Situations like this will limit optimization when you use
pointer indirection and reference types unnecessarily.
It is not that pointers are "bad", in fact they are
a powerful tool for defining complex dynamic data structures.
The same structures are awkward (or even painful) to define in
FORTRAN. It is just that in return for the expressiveness of C++,
you may give up performance in some cases. By understanding aliasing
issues, you can often write your code in a way that makes clear
that no aliasing is present. In most cases, removing the potential
for aliasing clarifies your intended algorithm, speeds it up,
and makes it less likely to harbor obscure bugs. For instance,
writing the example function set_opposite
so that
a
is passed by value instead of via a pointer removes
any possibility of aliasing.
void set_opposite( float y[], const float a, int n ) { for( int i=0; i<n; i++ ) y[i] = 1.0f - a; }
A general rule is that scalar and small structure inputs
to a function should pass by value, not by reference or via
pointer. This rule is not only good for the KAI C++ optimizer,
it is usually cited as a good programming practice because it
makes it two things clear: 1) the argument is input-only (also
true of const float *a
) and 2) the argument is not
merely a path to a modifiable location.
Though the C++ standard permits many forms of aliasing, some of those forms are useless in real programs. KAI C++ allows you to promise not to use certain forms of aliasing, and in return it may optimize better.
The specifics are listed below.
Option | Default | Command-line |
---|---|---|
abstract float | on | --abstract_float |
abstract pointer | off | --abstract_pointer |
The default settings allow almost all C++ programs to run correctly. If the program you are compiling is truly machine-independent, you should turn on both abstract float and abstract pointer to obtain best optimization.
The rest of this section explains exactly what you are promising when you turn on the options. If your programs do not depend upon details of machine architecture, you can simply turn on both options and skip reading the rest of this section.
When you compile with abstract float on, you are promising that your program does not depend upon storing a floating-point number into memory and reading it back as non-floating type via an alias, or vice-versa. You are promising to treat floating-point as abstract data types wherever aliasing is involved.
When you compile with abstract pointer on, you are promising a similar thing for pointers. Specifically, you are promising that your program does not depend upon storing a pointer into memory and reading it back as a non-pointer type via an alias. In particular, you are promising not to depend upon aliasing of pointers and integers. Even with this option set, you are still allowed to store a pointer of one type and read it back as a pointer of another type.
Neither option prohibits the practice (indispensable to some programmers) of storing objects as one type and loading them back as another type. They simply prohibit doing so via aliases.
For example, the following code might not be safe to compile with abstract float.
float f; int * i = (int*)&f; f = 3.0; int ieee_exponent = (*i) >> 23 & 0xFF
The code stores the value 3.0
into f
but loads it via the alias *i
. Obviously, this code
may not be portable because it depends on the particular bit-pattern
of a floating-point number.
It is possible to inspect floating-point bit-patterns in a way that is compatible with abstract float. You just have to use unions instead of aliasing. For example, the following code is safe to compile with abstract float.
union { float f; int i; } convert; convert.f = 3.0; int ieee_exponent = convert.i >> 23 & 0xFF;
KAI C++ knows that union members overlap in memory, but does not consider them to be aliased. Consequently, the last example works correctly even with abstract float.
The abstract float option does not concern aliases when there is no attempt to "communicate" though the alias. For example, the following code employs aliasing to store and load both a floating-point value and an integer value in a memory location at different times.
float f; int *i = (int*) &f; f = 3.0; printf( "%g", f ); *i = 2; printf( "%d", *i );
The example is safe to compile with abstract float turned on since values are always loaded as the same type as they were written.
This section is for advanced programmers who understand the optimization implications of aliasing.
KAI C++ provides a keyword restrict
that allows
programmers to promise that a pointer or reference has no aliases.
By default, the keyword is rejected by KAI C++ because it is not
part of standard C and C++. (The keyword restrict
is also recognized by Cray and Convex compilers, and is being
proposed as part of the C-9x standard.) To enable recognition
of restrict
, add the following command-line option:
--restrict
For instance, the example function set_opposite
from section 3.3.1 might be rewritten with restrict
.
Here is one way to rewrite it.
void set_opposite( float * restrict y, const float * a, int n ) { for( int i=0; i<n; i++ ) y[i] = 1.0f - *a; }
Notice the position of the keyword restrict
.
const
or volatile
qualify
a type.
y
, not the target of
y
. In the example, the keyword restrict
says that there are no secret aliases for the target of y
.
The type qualified must be a pointer. For instance, do not declare
y
like this: restrict float * n /* WRONG! */
y
, not the pointer y
. KAI C++ produces
an error message for such illegal uses of restrict
.
In this way, it differs from const
and volatile
,
which can qualify any type. The reason that restrict
is different is that it is describing access to an object,
not properties of the object itself.
It is crucial to understand restrict
if you use
it, because it is essentially a contract between you and KAI C++.
The contract for pointer y
qualified as restrict
,
is:
For the lifetime of pointer y
, only y
or pointers copied directly or indirectly from y
will be used to reference the sub-object pointed to by y
.
There are four important constraints in the contract:
y
" --
This means that the restriction lasts only between the time the
pointer is created and destroyed. In the example above, the pointer
is a formal parameter, which means that the restriction lasts
only over the invocation of function set_opposite
.
y
(for the lifetime of y
).
y
" -- This
says that you can use multiple restricted pointers to point to
different parts of the same object. The restriction is only that
you cannot use aliases to refer to the same part of an object.
Constraint #1 allows nested scoping for restricted pointers. Restricted pointers need not be parameters. Thus constraint #1 permits our example to be written like this:
void set_opposite( float * restrict y, const float * a, int n ) { { float * restrict yeven = y; // Begin lifetime of yeven for( int i=0; i<n; i+=2 ) yeven[i] = 1.0f - *a; } // End lifetime of yeven { float * restrict yodd = y; // Begin lifetime of yodd for( int i=1; i<n; i+=2 ) yodd[i] = 1.0f - *a; } // End lifetime of yodd }
For the lifetime of yeven
it is the only pointer
used to access the array, and likewise for yodd
.
Here is perhaps a more practical application of constraint #1. It shows how to rewrite our example to gain the benefit of restricted pointers in the implementation without making them part of the interface.
void set_opposite( float y[], const float * a, int n ) { float * restrict yr = y; for( int i=0; i<n; i++ ) yr[i] = 1.0f - *a; }
There are arguments in favor and against making restricted
pointers parts of interfaces. An argument in favor is that the
client of the interface knows not to pass aliased pointers to
the interface. An argument against is that restrict
is not part of standard C++, and you may be writing a non-standard
C++ implementation of an interface written in standard C++. For
example, the standard C++ library function memcpy
does not specify restricted pointers as part of its interface.
However, passing overlapping objects to it is disallowed by the
standard. Thus implementations are allowed to declare is as:
void memcpy( void * restrict dst, void * src, size_t n );
We leave the choice on whether to use restricted pointers in interfaces up to the programmer.
The fact that restricted pointers can be scoped within local
blocks lets us go further. Suppose that function set_opposite
is supposed to be able to handle aliased arguments. Then
we could write the "two version" loops below:
void set_opposite( float y[], const float * a, int n ) { if( y <= a && a<y+n ) { // *a aliased to member of y. Use unrestricted pointer. for( int i=0; i<n; i++ ) y[i] = 1.0f - *a; } else { // Use restricted pointer so // compiler can optimize loop. float * restrict yr = y; for( int i=0; i<n; i++ ) yr[i] = 1.0f - *a; } }
Of course, in such a simple example, it would be easier to
write a "one version" loop and hoist the invariant expression
by hand. However, in real problems the invariant expression may
be hidden by an inline function or template, making hand-optimization
impractical. The example merely demonstrates the flexibility of
the restrict
qualifier. Once you understand the flexibility,
you should be able to make it suit your own purposes.
Constraint #2 allows restricted pointers to be copied. The
copy of the pointer may be restricted or unrestricted. Below is
a variation on set_opposite
that copies a restricted
pointer to an unrestricted pointer.
void set_opposite( float * restrict y, const float * a, int n ) { float * z = y; for( int i=0; i<n; i++ ) *z++ = 1.0f - *a; }
KAI C++ knows how to track the pointer assignments and consequently optimizes this version too.
Constraint #3 says that a pointer that is unused (or used for pointer arithmetic only) is harmless.
Constraint #4 allows different parts of an object to be accessed by different restricted pointers (or even some unrestricted pointers), as demonstrated by the code below.
void set_opposite( float y[], const float * a, int n ) { float * restrict ylo = y; // Get pointer to low half of y float * restrict yhi = y + n/2; // Get pointer to high half of y for( int i=0; i<n/2; i+=2 ) { ylo[i] = 1.0f - *a; yhi[i] = 1.0f - *a; } if( n/2+n/2 < n ) { // Was an odd element left out? y[n-1] = 1.0f - *a; } }
In this version, two restricted pointers are used at the same
time, but they point to different parts of the array y
.
Furthermore, a third pointer (y
itself!) is used
during the lifetime of the restricted pointers. This use is okay
because it accesses the part of array y
not accessed
by either of the restricted pointers.
The keyword restrict
also applies to pointers
that are targets of pointers, in the same manner that const
and volatile
keywords apply to targets of pointers.
When more than one level of indirection is involved, it is usually
best to put restrict
at all levels. For example,
a common method of representing matrices to keep an array of pointers
to the rows of the matrix. Below is a routine that multiplies
two such matrices a
and b
and stores
the result in matrix c
.
void matmul( float * restrict * restrict c, float ** a, float ** b, int n ) { for( int i=0; i<n; i++ ) for( int j=0; j<n; j++ ) { float sum = 0.0f; for( int k=0; k<n; k++ ) sum += a[i][k]*b[k][j]; c[i][j] = sum; } }
Omitting either restrict
qualifier changes the
meaning of the declaration. If the first restrict
is omitted, the declaration becomes "float ** restrict
c
", which says that c
points to an array
of pointers, and the only access to that array is via c
.
The declaration allows some aliasing, because some or all of the
pointers in the array might be identical, or might be the same
as pointers in the arrays for a
or b
.
If the second restrict
is omitted, the declaration
becomes "float * restrict * c
", which says
that c
points to an array of pointers, and that the
pointers in the array are the only access to their targets. The
declaration allows some aliasing, because it permits pointers
a
and b
to point to the same array of
pointers.
Notice that in function matmul
, only the pointer
c
for the output is restricted. The input pointers
are plain unrestricted pointers. In general, it is best to mark
output pointers (and not input pointers) as restricted, and (if
applicable) to mark all levels of indirection as restrict
.
The reason is that aliasing of inputs does not affect optimization.
It is only aliasing of an output with an input or another output
that degrades optimization. Furthermore, it is sometimes desirable
to alias inputs. The matrix-multiply written above can be used
to square a matrix by passing in the same pointer for arguments
a
and b
.
The restrict
keyword can qualify references and
array parameters. The reference syntax is similar to the pointer
syntax, except that there is a &
instead of a
*
. The array syntax is surprising -- the qualifier
goes inside the brackets. Here is the earlier example rewritten
with the array syntax.
void set_opposite( float y[restrict], const float * a, int n ) { for( int i=0; i<n; i++ ) y[i] = 1.0f - *a; }
For syntactic consistency, when recognition of restrict
is turned on, the const
and volatile
qualifiers can also appear inside the brackets.
The pointer this
for methods can be qualified
with restrict
. Below is an example.
class Foo { public: int sum; void add( int& a ) restrict { sum += a; } };
To qualify this
, the keyword restrict
is placed where the const
and volatile
qualifiers would be. Please note the subtle difference between
restrict and the other qualifiers when used this way. The keyword
restrict
technically qualifies the pointer this
,
whereas the keywords const
or volatile
used this way qualify the target of this
. Of course,
qualifying a pointer as restrict
says that the object
has no aliases (for the lifetime of the method invocation).
The keyword restrict
is designed so that a compiler
can choose to ignore it. If you use restrict
in your
code, you can compile it with compilers that do not understand
it simply by defining it away as shown below.
#define restrict /**/
Unlike const
and volatile
, the restrict
keyword is not used for overload resolution. Thus when
used legally, removing it never causes a program to generate different
results. (When restrict
is used illegally to lie
about aliasing, removing it may cause a program to generate different
results. Of course, the same can be said about lying with const
.)
To summarize, the keyword restrict
is a way to
specify that a pointer, reference, or array parameter has no aliases.
It is for advanced users who understand the optimization implications
of aliasing. Our rules of thumb for using restrict
are listed below.
restrict
sparingly.
restrict
on output pointers, not input pointers
restrict
to all levels of indirection.
#define
to remove restrict
when the compiler does not support it.
restrict
when aliasing is expected.