Node:Glucas compiler options, Previous:Alternative builds, Up:Installing Glucas
To make Glucas as fast and reliable as possible, we have to include some flags and macro definitions at compile time. Some flags are specific for the compiler and others for Glucas or the YEAFFT library.
We cannot help about the specific compiler flags. You should read the compiler's documentation. Select those flags which make the binary as fast as possible.
There are some macro definitions we can set using the macro definition
facility of most compilers. Then you have to include
-DYOUR_OPTION1[=value1] [-DYOUR_OPTION2[=value2]] ...
with the compiler flags when invoking the compiler.
For Metrowerks CodeWarrior a similar functionality can be achieved
by editing the parameters in the file macos-codewarrior-prefix.h
and including it in the Prefix File setting of the C/C++ Compiler Settings.
3
. We can define 4
or 5
. It is
recommended to use the default and then you can see the performance with other
choices.
Y_AVAL=3
YEAFFT uses radices 4,5,6,7,8 and 9 in FFTs passes.
It is the default and the best option on most systems.
Y_AVAL=4
It uses radices 4,5,6,7,8 and 9 in first FFTs pass and
8,16 in the other passes.
Y_AVAL=5
It also can use radix32 reduction in middle passes.
There are few processors which we can gain some speed.
Y_AVAL > 3
, you can use a first radix pass
reduction from 10 to 16 (See Glucas internals.) Due to the many local
variables these routines use, it is only an advantage when the processor has
a lot of registers (32 FPU registers or more). Indeed, we still have not
seen a machine which uses this feature with gain.
value
should be a power of two.
ygeneric.h
.
This file is written with a generic processor in mind. This generic
processor is the default and is set defining Y_TARGET=0
. Sure, there
are many things one could write better for a specific processor. If you are brave
enough, do it. You can even write a collection of assembler macros. This is
an advanced feature, we recommend do not touch it. You should change some
lines in mccomp.h
file and write your own my_proc.h
file.
Recently, from release v.2.8a. Prefetch hints has been introduced. It increases
the performance a lot in some cases. To use this feature, you have to define
other than generic Y_TARGET. Up to release 2.8b this is the list of
value
for targets. Options 16,17,41 and 51 are not recommended,
they are still experimental.
0
1
_asm_
extensions.
11
_asm_
extensions.
12
_asm_
extensions.
16
_asm_
extensions. Not recommended.
17
_asm_
extensions. Not recommended.
21
_asm_
extensions or Metrowerks
Codewarrior intrinsics.
23
_asm_
extensions or Metrowerks
Codewarrior intrinsics.
31
asm
calls.
32
_asm_
extensions.
41
_asm_
extensions. Not recommended.
51
_asm_
extensions. Not recommended,
use Y_ITANIUM
option for a terrific performance.
Y_TARGET
other than generic, some
routines can be unrolled to avoid unnecessary calls to prefetch hints. It
could be useful when prefetch hints are expensive in performance terms. At the
moment, it is still an experimental feature.
configure
script and using
--enable-pthread=n
. If you want to use n threads you
should then add -D_PTHREADS=n
to your command line compiler options.
At the moment n has to be power of two. When using configure
the
option --enable-pthread
is equivalent to --enable-pthread=2
.
This option is not recommended for single processor machines.
Y_NUM_THREADS
. Warning. If you want to use OpenMP
multiprocessing you have to enable it with the proper compiler flag (usually
-omp
). Then, the compiler have already defined _OPENMP
macro so YOU DON'T HAVE TO DEFINE IT EXPLICITLY. This option is not
recommended for single processor machines.
PARALLEL=n
in your shell environment, and you also have to define Y_NUM_THREADS
and the specific compiler flag -xexplicitpar
.
This option is not recommended for single processor machines.
OpenMP
or SunMP
multithreaded options. You have to assign a
value when using SunMP
but not when using OpenMP
(here the
best choice can be computed at runtime). Anyway, to avoid other than power of
two number of threads, it is better to use this option whenever you use
both _OPENMP
or _SUNMP
.
The following macros are defined to manage some aspects of Lucas Lehmer tests, not the FFT routines.
You can also manage the roundoff check at run time by editing the ini file
(See Configure files.)
Discrete Weighted Transform
(See Glucas internals.) has some unpredictable branches. This can slowdown
the performance in some processors. Glucas has an alternative code
to avoid all these branches but with some float and integer instructions as
cost. Some processors can gain with this option activated.
Discrete Weighted Transform
.