floating point errors

Hi,

We have code that computes the sum of squares from MKL cblas_sgemm and also cblas_dgemm and found large differences in the values. We have replicated the issue with a simple program that uses a sum of squares on the same data and also reproduced the differences with the GCC compiler. We are running ICC on Linux using Intel Xeons.
The code is:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdarg.h>

int main(int argc, const char * argv[]) {

    float weight;
    double dsumwts = 0;
    float fsumwts = 0;
    double diffsum = 0;
    double dwt;
    int count = 0;

    FILE *datafile;

    if((datafile=fopen(argv[1],"r"))==NULL){
        printf("Cannot fopen file %s\n", argv[1]);
        exit(1);
    }

    float a =1.0;

    while(fscanf(datafile,"%g",&weight)!=EOF){
        //weight = ((float)rand()/(float)(RAND_MAX)) * a;
        dwt=weight;
        dwt=sqrt(dwt);
        weight=sqrtf(weight);
        dsumwts=dsumwts+(dwt*dwt);
        fsumwts=fsumwts+(weight*weight);
        diffsum=diffsum+(dwt*dwt-weight*weight);
        printf("Error Record %5d %12.7f %12.7f %12.7f %12.7f\n",count+1,dsumwts-fsumwts, dwt*dwt, weight*weight, (dwt*dwt-weight*weight));
        count++;
    }

    printf("Double Sum = %12.5f\n",dsumwts);
    printf("Float  Sum = %12.5f\n",fsumwts);
    printf("Diff   Sum = %12.5f\n",diffsum);

    fclose(datafile);
    return 0;
}

when run the code we get:

tail float.txt
Error Record 43653   52.4184895    1.2222650    1.2222650    0.0000000
Error Record 43654   52.4220045    1.2222650    1.2222650    0.0000000
Error Record 43655   52.4255195    1.2222650    1.2222650    0.0000000
Error Record 43656   52.4290345    1.2222650    1.2222650    0.0000000
Error Record 43657   52.4325495    1.2222650    1.2222650    0.0000000
Error Record 43658   52.4325495    1.0000000    1.0000000    0.0000000
Error Record 43659   52.4325495    1.0000000    1.0000000    0.0000000
Double Sum =  91187.02630
Float  Sum =  91134.59375
Diff   Sum =      0.00000

The sum of differences from the floating point and double are zero but the difference between the sums is approx. 52.
When we uncomment random weight and rerun we get:

tail float.txt
Error Record 43653    0.0709642    0.6583248    0.6583249   -0.0000001
Error Record 43654    0.0716534    0.3483455    0.3483455   -0.0000000
Error Record 43655    0.0719731    0.9007103    0.9007103   -0.0000000
Error Record 43656    0.0717455    0.0622724    0.0622724    0.0000000
Error Record 43657    0.0718161    0.4180393    0.4180393   -0.0000000
Error Record 43658    0.0724126    0.8658309    0.8658308    0.0000001
Error Record 43659    0.0725415    0.1895820    0.1895820    0.0000000
Double Sum =  21757.27762
Float  Sum =  21757.20508
Diff   Sum =      0.00001

which is what we would expect. We have done a random sort on the data and still get large differences. The data contains approximately 40,000
floating point numbers ranging from 0.5 to 19 with lots of repeats on 1 or 2 values. The data is attached.

Any thoughts on what could be causing these differences.

Thanks
Bevin

Attachment	Size
Download wt.txt.zip	35.79 KB

Zone:

Server

Thread Topic:

Help Me