Logo

Controlling the rounding mode






A floating point operation usually generates a round-off error. It means that the mathematical result is not a floating point number i.e. it cannot be coded exactly in memory. Therefore, a floating point number has to be chosen to approximate the exact result. The deterministic way which replaces the mathematical value by the floating point number is called the rounding mode. The IEEE floating point arithmetic includes four rounding modes: to the nearest (by default for all compilers), to zero, to plus infinity and to minus infinity. Four functions are proposed below to change in an efficient way the rounding mode in a program:
rd_near (for rounding to the nearest),
rd_zero (for rounding to zero),
rd_minf (for rounding to minus infinity) and
rd_pinf (for rounding to plus infinity).
These functions are written in assembly language. They change in the FPU (Floating Point Unit) Control Word the two bits associated with the rounding mode. The following C code shows how to use them. The same operations are executed with the four rounding modes and should produce four different results.

#include <stdio.h>
void main()
{float x, y, z1, z2;
x = 1.0;
y = 1.0e-20;
rd_near();
z1 = x - y; z2 = y - x; z1 = z1 - x; z2 = z2 + x;
printf("near, z1 = %17.10e, z2 = %17.10e \n",z1, z2);
rd_minf();
z1 = x - y; z2 = y - x; z1 = z1 - x; z2 = z2 + x;
printf("minf, z1 = %17.10e, z2 = %17.10e \n",z1, z2);
rd_pinf();
z1 = x - y; z2 = y - x; z1 = z1 - x; z2 = z2 + x;
printf("pinf, z1 = %17.10e, z2 = %17.10e \n",z1, z2);
rd_zero();
z1 = x - y; z2 = y - x; z1 = z1 - x; z2 = z2 + x;
printf("zero, z1 = %17.10e, z2 = %17.10e \n",z1, z2);
};

If the four functions work and without optimization, the results should be:

near, z1 = 0.0000000000e+00, z2 = 0.0000000000e+00
minf, z1 = -5.9604644775e-08, z2 = -0.0000000000e+00
pinf, z1 = 0.0000000000e+00, z2 = 5.9604644775e-08
zero, z1 = -5.9604644775e-08, z2 = 5.9604644775e-08

These functions are given through an assembler source codes which can be compiled on any Unix or Linux system with (for instance):

as rounding_pc.s -o rounding_pc.o

If rounding_test.c is the C source code presented above, the executable code rounding_test is obtained with the following instruction:

gcc rounding_test.c rounding_pc.o -o rounding_test

In all the following assembler source codes, function labels have been written for the GCC compiler which does not add any underscore on the labels of C source codes. Of course, on Unix or Linux systems, the rounding_pc.s code can be used with other compilers (for C, C++, FORTRAN or ADA programs for instance). But a lot of compilers change "true" labels of original source codes by adding underscores before or after subroutine or function names. For instance, the NagWare F95 compiler adds one underscore at the end of each name. To make the following assembler source codes still available, you just have to edit the source and add the required underscores before you compile it.

THE FOLLOWING CODES ARE ONLY AVAILABLE FOR UNIX OR LINUX SYSTEMS AND WITHOUT ANY GUARANTEE.

To choose the rounding mode on Decalpha computers.
To choose the rounding mode on HP computers.
To choose the rounding mode on IBM computers.
To choose the rounding mode on PC computers with Intel or compatible processors.
To choose the rounding mode on SGI computers.
To choose the rounding mode on SUN computers.
To get the C source code for testing the rounding functions.


More information can be requested to the CADNA team
Thanks to Baptiste Mary for the CADNA logo