Untitled

Node:Glucas internals, Next:Contributors, Previous:Invoking Glucas, Up:Top

To know more about Glucas, it is necessary to be familiar with some concepts

Mersenne number: What is a Mersenne Number
Lucas-Lehmer test: What a Lucas-Lehmer test does

As you can read about Lucas-Lehmer test (See Lucas-Lehmer test.), basically this test is a series of modular squares and subtractions. To have any chance to understand what follows you have to be familiar with modular arithmetic and Fast Fourier Transforms.

Glucas is an application to deal with Mersenne Numbers while YEAFFT is a collection of routines designed to perform general integer convolutions.

Glucas general structure

Here we only want to give a general view of Glucas's structure. Better than being verbose about algorithms, techniques and tricks used, it is better to directly look at the source.

The main routine of Glucas is located in the file glucas.c. It is the top level routine. Here the argument parsing control, the main Lucas-Lehmer test control, the calls to input/output, initializing,... can be found. The main header for YEAFFT is yeafft.h and for Glucas is glucas.h

The Input/output code is mainly located in the file gio.c. The initial work for YEAFFT is done in the file yeainit.c. lucasini.c initializes special arrays needed for Lucas-Lehmer test and DWT.

The signal management is done in the file gestup.c.

The builtin selftest code is located in selftest.c.

The POSIX threads are directed from gthreads.c file. OpenMP, _SunMP, or POSIX thread directives or special routines can be found in gthreads.h, glucas.c, ynorm_*.c files, lucasini.c, yealucas.c and yeafft1.c.

When rounding the float elements, a fast round trick is used. The round trick constants can be found in tricky.c

Prefetch data macros are located in the file prefetch.h. You can also find tons of assembler macros for x86 architecture in the *yx86*.h header files.

The passes in a L-L test and the calls to YEAFFT routines are in yeafft.c, yeafft1.c and yealucas.

The general passes in a Lucas-Lehmer iteration, after reading from a save file (ginput() and read_check_point() in gio.c file) are

Transform the data: The data is transformed from Mersenne Interchangeable Residue Format. (See Save file format.) to an array of double floats. This task is performed by the read_standar() routine in gstd.c file. The data is cut into small chunks of about 19 or 20 bits and balanced.
Multiply the data by DWT factors: Here the data is prepared to be used by YEAFFT. It is needed to multiply by weighed factors. This is done by the normalize_first() routine in the file ynorm.c.
First initial forward FFT pass: Already suited for FFT, a first forward FFT pass is performed with the aid of routines radix_*.c from the YEAFFT library. Once this step is done, the data is ready to enter the normal Lucas-Lehmer loop iteration cycle.
Optimized Lucas-Lehmer iteration: Starting from data after step 3, a normal optimized Lucas-Lehmer iteration includes the following steps. Note that a normal cycle begins in F. At the end of E is where it can jump to multi thread execution or abandon it.
1. Forward DIF FFT pass 1: It includes all forward FFT radices reduction steps while the needed data cannot be encapsuled into independent data arrays of Y_MEM_THRESHOLD complex size. (routines in files radix_4.c, radix_8.c, radix_16.c and radix_32.c.
2. Forward DIF FFT pass 2: The remaining forward steps except for the last one. Made by the same routines than prior item.
3. Last forward DIF FFT step, dyadic square and first backward DIT FFT step. To optimize memory accesses: All these three processes are made by the same routines (see difdit_4.c, difdit_8.c, difdit_16.c and dyadic_32.c).
4. Backward DIT FFT pass 2: Backward DIT FFT steps while moving between Y_MEM_THRESHOLD limits. Items B, C and D are made successively in a loop of blocks of independent data after A. Once finished, all blocks of memory are passed to the next item.
5. Backward DIT FFT pass 1: Backward DIT FFT steps. The last backward step is excluded. After this pass the type of the next iteration is analyzed. If the next L-L iteration is normal, then it continues in the next item and jumps to multithreaded (if required), if not jump to single thread and go to item 5.
6. Last backward DIT FFT step, mul by inverse DWT factors, normalize data, subtract two, mul by DWT factors, and first forward DIF FFT step: All these tasks are performed at once by the ynorm_*.c files. Once this is done, go again to A to the next L-L iteration.
When an interrupt signal is received, or it is required to write a save file, or the Lucas-Lehmer is finished, then F is substituted by
Last backward DIT FFT: It is completed the inverse FFT. (routines in radix_*.c files).
Subtract two, mul by inverse DWT factors, normalize: This is done by the routine last_normalize() in the file y.norm.c. Here can be checked if the residue is zero after a finished L-L test or extract the first 64 bits in the residue if it is required. (routines iszero() and res64() in file gio.c).
If it have to write a save file then next two items are
Convert float data to integer format: (gstd.c, compat.c files)
Write a save file: (gio.c)

References and further reading.

The best place to begin reading about Mersenne Primes, GIMPS and related stuff is Mersenne.org

There is a very nice site World of Mathematics, where you can consult almost every mathematical concept. I really recommend to consult this site. Is by far deep and better about mathematics than anything we can write here.

A good index about primes is Chris Caldwell's The Prime Pages. You can see a lot of theorems and proofs related to primality tests.

To know more about R. Crandall and B. Fagin Discrete Weighted Transform (DWT) see the original paper: "Discrete weighted transforms and large-integer arithmetic", Math Comp, 62 (1994),305-324.

To understand the special algorithm YEAFFT uses, it is necessary to read the paper "Integer convolution via split-radix fast Galois transform", by R. Crandall. You can download a free postscript copy from perfsci.com. Note that YEAFFT uses floating point trigonometric FFT, not Fast Galois Transform. Anyway the algorithm is easy to use in trig. FFT.