Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 2797

Vectorized code fails on AVX512 but not on AVX2?

$
0
0

I have this code which I test on my AVX2 machine:

bool interpolate(const Mat &im, float ofsx, float ofsy, float a11, float a12, float a21, float a22, Mat &res)
{
   bool ret = false;
   // input size (-1 for the safe bilinear interpolation)
   const int width = im.cols-1;
   const int height = im.rows-1;
   // output size
   const int halfWidth  = res.cols >> 1;
   const int halfHeight = res.rows >> 1;
   float *out = res.ptr<float>(0);
   const float *imptr  = im.ptr<float>(0);
   for (int j=-halfHeight; j<=halfHeight; ++j)
   {
      const float rx = ofsx + j * a12;
      const float ry = ofsy + j * a22;
      #pragma omp simd
      for(int i=-halfWidth; i<=halfWidth; ++i, out++)
      {
         float wx = rx + i * a11;
         float wy = ry + i * a21;
         const int x = (int) floor(wx);
         const int y = (int) floor(wy);
         if (x >= 0 && y >= 0 && x < width && y < height)
         {
            // compute weights
            wx -= x; wy -= y;
            int rowOffset = y*im.cols;
            int rowOffset1 = (y+1)*im.cols;
            // bilinear interpolation
            *out =
                (1.0f - wy) * ((1.0f - wx) * imptr[rowOffset+x]   + wx * imptr[rowOffset+x+1]) +
                (       wy) * ((1.0f - wx) * imptr[rowOffset1+x] + wx * imptr[rowOffset1+x+1]);
         } else {
            *out = 0;
            ret =  true; // touching boundary of the input
         }
      }
   }
   return ret;
}

As suggested by Intel Advisor, I added #pragma omp simd to force vectorization since the compiler (icpc 2017 update 3) assumed an inexistent dependency. On my AVX2 machine this doesn't produce any error and actually improve perfomance.

However, on the AVX-512 machine (with same compiler and version) this generates a segmentation fault. Why this happens?

The compilation flags are the same, expect that one use -xCORE-AVX2' and the other one-xMIC-AVX512`. This is the complete set of compilation flags:

INTEL_OPT=-O3 -ipo -simd -xCORE-AVX2 -parallel -qopenmp -fargument-noalias -ansi-alias -no-prec-div -fp-model fast=2 -fma -align -finline-functions
INTEL_PROFILE=-g -qopt-report=5 -Bdynamic -shared-intel -debug inline-debug-info -qopenmp-link dynamic -parallel-source-info=2 -ldl

 


Viewing all articles
Browse latest Browse all 2797


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>