Testing SIMD on KNL

April 10, 2017, 6:45 pm

Latest and popular articles on Intel Technologies

≫ Next: Issue passing shared_ptr by copy to function in 32 bits

≪ Previous: Preprocessor Macro and _Quad (float128)

Hello All,

Hope I am asking in the right forum!!

I have a simple/naive question, , I made a simple program to run on one thread of KNL (68 cores, Flat-Quadrant, MCDRAM used). I ran my code twice with the following configurations:

1) #pragma simd reduction(...) at the top of the loop and compiler option -xMIC_AVX512.

2) #pragma novector and removed -xMIC_AVX512 and added -no-simd. The loop is not vectorized and no AVX instructions are used (checked the assembly file).

The GFLOPS of the first one is 1.5 GFLOPS and for the second one is 0.8. The speedup is almost 2X only. Can anyone please explain why I don't get a good speedup (Closer to 8) ?

long count  = 10000000
//Same loop for Cold Start
stime = dsecnd();

//1) #pragma simd reduction(+:result)
//2) #pragma novector
for (long i = 0; i < count; i++ )
{

    result += (A[i] * B[i]);
}

etime = dsecnd();

double bestExTime = (etime - stime);
double gplops = (1.e-9 * 2.0 * count) / bestExTime;
printf("%f,%f\n" ,result,  gplops);

Thanks,

↧

Issue passing shared_ptr by copy to function in 32 bits

April 11, 2017, 8:11 am

Latest and popular articles on Intel Technologies

≫ Next: Not vectorized loop with if statement

≪ Previous: Testing SIMD on KNL

Hello!

Here is my setup: ICC 16.0 update 3 on Visual Studio 2015. This issue is seen only in 32 bits, not in 64 bits. I observe a strange behaviour that leads to runtime error. Here is a snippet that reproduce the bug:

#include <iostream>
#include <memory>
using namespace std;

void func( shared_ptr<int> p )
{
    if ( p == nullptr )
    {
        cout << "Is nullptr"<< endl;
    }
    else
    {
        cout << "Is NOT nullptr"<< endl;
    }
}

int main( int argc, char* argv[] )
{
    shared_ptr<int> a = nullptr;

    cout << endl;

    func( nullptr );

    return 0;
}

The problem occurs when calling function "func(nullptr)". The passed object is perfect garbage. So it is not nullptr ("else" section) but induces a crash when exiting the function due to invalid destructor execution. On the opposite, shared_ptr "a" in main() is correctly initialized to nullptr. More than that, if I remove the "cout << endl", the bug doesn't occur.

Note again that this code works correctly in 64 bits and also works correctly in both 32/64 bits when compiled with MSVC 14.0 (VS2015). The problem occurs in 32 bits Release and Debug configurations with ICC 16 update 3.

I tried a lot of combinations of compilation options but cannot find one that would solve the problem.

Is it a known bug? Someone else experienced it? I do something wrong? Is there a workaround? Any help will be appreciated! Thank you!

Zone:

Windows*

Thread Topic:

Bug Report

↧

Not vectorized loop with if statement

April 11, 2017, 11:24 am

Latest and popular articles on Intel Technologies

≫ Next: long double operators and mathimf overloaded functions

≪ Previous: Issue passing shared_ptr by copy to function in 32 bits

This is the first time that I try to vectorize a loop, I'm trying to optimize this code.

In particular, according to intel advisor, this is the best candidate for vectorization:

   for (int j=-halfHeight; j<=halfHeight; ++j)
   {
      for(int i=-halfWidth; i<=halfWidth; ++i)
      {
	     const float rx = ofsx + j * a12;
	     const float ry = ofsy + j * a22;
         float wx = rx + i * a11;
         float wy = ry + i * a21;
         const int x = (int) floor(wx);
         const int y = (int) floor(wy);
         if (x >= 0 && y >= 0 && x < width && y < height)
         {
            // compute weights
            wx -= x; wy -= y;
            // bilinear interpolation
            *out++ =
               (1.0f - wy) * ((1.0f - wx) * im.at<float>(y,x)   + wx * im.at<float>(y,x+1)) +
               (       wy) * ((1.0f - wx) * im.at<float>(y+1,x) + wx * im.at<float>(y+1,x+1));
         } else {
            *out++ = 0;
         }
      }
   }

And this is the message in the optr file:

remark #15522: loop was not vectorized: loop control flow is too complex. Try using canonical loop form from OpenMP specification

How can I solve this?

↧

long double operators and mathimf overloaded functions

April 12, 2017, 2:39 am

Latest and popular articles on Intel Technologies

≫ Next: ICC fails to build, but GCC works

≪ Previous: Not vectorized loop with if statement

Hey,

I'm having trouble converting my code to long double.

Some compiler flag in Visual Studio seems to reduce the precision when I use any operator (e.g. "*=", or "*")
It seems that I can fix it with the compiler argument "/Qpc80".
How does the "/Qpc80" interact with the Floating Point Model "Precise (/fp:precise)"?
The overloaded long double version of log() doesn't seem to be available, so the double version is used. I have to call the logl() directly.
Isn't the overloaded version of log(long double) supposed to be provided by the mathimf.h?
In a larger project long double version of log() of Microsoft is used although I didn't include "math.h"
It seems that some other std headers include the math.h if i didn't include mathimf.h first in each c/cpp file. Either I get linker errors or I can see in the debug mode, that the long double version of Microsoft math.h is used, which calls the double version.
Should I include mathimf.h in each file before including other files?

I wrote a program to test narrow down the problems. Basically I'm using quad precision (which seems to work) to test the other data types:

	long double b = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ", "long double(1/3)");
	myDebugPrintDigits(b, 45);
	b *= 2;
	printf("\n%20s = ", "long double 2*(1/3)");
	myDebugPrintDigits(b, 45);
...
	b = log(2.0L);
	printf("\n%20s = ", "long double(log(2))");
	myDebugPrintDigits(b, 45);
	b = logl(2.0L);
	printf("\n%20s = ", "long double(logl(2))");
	myDebugPrintDigits(b, 45);

The result looks like this in Visual Studio 2015 (added compiler flags "/Qoption,cpp,--extended_float_type /Qlong-double"):

         double(1/3) = 3.33333333333333314829616256247390992939472198e-1
      double 2*(1/3) = 6.66666666666666629659232512494781985878944396e-1
    long double(1/3) = 3.33333333333333333342368351437379203616728773e-1
 long double 2*(1/3) = 6.66666666666666629659232512494781985878944396e-1
          _Quad(1/3) = 3.33333333333333333333333333333333307654267408e-1
       _Quad 2*(1/3) = 6.66666666666666666666666666666666615308534816e-1
      double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
 long double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
long double(logl(2)) = 6.93147180559945309428690474184975300886435434e-1
    _Quad(__logq(2)) = 6.93147180559945309417232121458176613602238496e-1
      correct log(2) = 6.93147180559945309417232121458176568075500134e-1

sizeof(double) = 8
sizeof(long double) = 16
sizeof(_Quad) = 16
__IMFLONGDOUBLE = 80

The list of compiler flags used by Visual Studio is "/GS /W3 /Zc:wchar_t /ZI /Od /Fd"x64\Debug\vc140.pdb" /D "_MBCS" /Zc:forScope /RTC1 /MDd /Fa"x64\Debug\" /EHsc /nologo /Fo"x64\Debug\" /Qprof-dir "x64\Debug\" /Fp"x64\Debug\Projekt1.pch" + "/Qoption,cpp,--extended_float_type /Qlong-double".
The long double precision is reduced to double precision when I multiply it with 2 (see "long double 2*(1/3)" compared to "long double(1/3)")!

If I compile the source directly with icl ("icl main.cpp /Qoption,cpp,--extended_float_type /Qlong-double"), then I get:

         double(1/3) = 3.33333333333333314829616256247390992939472198e-1
      double 2*(1/3) = 6.66666666666666629659232512494781985878944396e-1
    long double(1/3) = 3.33333333333333333342368351437379203616728773e-1
 long double 2*(1/3) = 6.66666666666666666684736702874758407233457546e-1
          _Quad(1/3) = 3.33333333333333333333333333333333307654267408e-1
       _Quad 2*(1/3) = 6.66666666666666666666666666666666615308534816e-1
      double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
 long double(log(2)) = 6.93147180559945286226763982995180413126945495e-1
long double(logl(2)) = 6.93147180559945309428690474184975300886435434e-1
    _Quad(__logq(2)) = 6.93147180559945309417232121458176613602238496e-1
      correct log(2) = 6.93147180559945309417232121458176568075500134e-1

sizeof(double) = 8
sizeof(long double) = 16
sizeof(_Quad) = 16
__IMFLONGDOUBLE = 80

Compiling manually with icl or adding "/Qpc80" in Visual Studio seems to solve the multiply precision issue, but the log(long double) function is still not using the logl() method. Is this intended behavior?

Thanks,
Christian

The full code is:

#include <mathimf.h>
#include <stdio.h>

typedef _Quad float128_type;

extern "C" {
	_Quad __ldexpq(_Quad, int);
	_Quad __frexpq(_Quad, int*);
	_Quad __fabsq(_Quad);
	_Quad __floorq(_Quad);
	_Quad __ceilq(_Quad);
	_Quad __sqrtq(_Quad);
	_Quad __truncq(_Quad);
	_Quad __expq(_Quad);
	_Quad __powq(_Quad, _Quad);
	_Quad __logq(_Quad);
	_Quad __log10q(_Quad);
	_Quad __sinq(_Quad);
	_Quad __cosq(_Quad);
	_Quad __tanq(_Quad);
	_Quad __asinq(_Quad);
	_Quad __acosq(_Quad);
	_Quad __atanq(_Quad);
	_Quad __sinhq(_Quad);
	_Quad __coshq(_Quad);
	_Quad __tanhq(_Quad);
	_Quad __fmodq(_Quad, _Quad);
	_Quad __atan2q(_Quad, _Quad);
}


void myDebugPrintDigits(_Quad q, int noOfDigits) {
	int i,j,k;
	j = 0;
	while (q < 1) {
		q *= 10;
		j--;
	}
	while (q > 10) {
		q /= 10;
		j++;
	}
	i = floor((double)q);
	k = 0;
	while (q > 0 && k<noOfDigits) {
		q -= i;
		printf("%d", i);
		q *= 10;
		i = __floorq(q);
		if (k == 0)
			printf(".");
		k++;
	}
	printf("e%d",j);
}


int main() {

	double a = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ","double(1/3)");
	myDebugPrintDigits(a, 45);
	a *= 2;
	printf("\n%20s = ", "double 2*(1/3)");
	myDebugPrintDigits(a, 45);

	long double b = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ", "long double(1/3)");
	myDebugPrintDigits(b, 45);
	b *= 2;
	printf("\n%20s = ", "long double 2*(1/3)");
	myDebugPrintDigits(b, 45);

	_Quad c = 0.333333333333333333333333333333333333333333333q;
	printf("\n%20s = ", "_Quad(1/3)");
	myDebugPrintDigits(c, 45);
	c *= 2;
	printf("\n%20s = ", "_Quad 2*(1/3)");
	myDebugPrintDigits(c, 45);

	a = log(2.0f);
	printf("\n%20s = ", "double(log(2))");
	myDebugPrintDigits(a, 45);
	b = log(2.0L);
	printf("\n%20s = ", "long double(log(2))");
	myDebugPrintDigits(b, 45);
	b = logl(2.0L);
	printf("\n%20s = ", "long double(logl(2))");
	myDebugPrintDigits(b, 45);
	c = __logq(2.0q);
	printf("\n%20s = ", "_Quad(__logq(2))");
	myDebugPrintDigits(c, 45);

	printf("\n%20s = %s", "correct log(2)","6.93147180559945309417232121458176568075500134e-1");

	printf("\n");

	printf("\nsizeof(double) = %d", sizeof(double));
	printf("\nsizeof(long double) = %d", sizeof(long double));
	printf("\nsizeof(_Quad) = %d", sizeof(_Quad));
	printf("\n__IMFLONGDOUBLE = %d", __IMFLONGDOUBLE);
	return 0;
}

Zone:

Windows*

Thread Topic:

Question

↧

ICC fails to build, but GCC works

April 13, 2017, 7:09 am

Latest and popular articles on Intel Technologies

≫ Next: Join the Intel® Parallel Studio XE 2018 Beta program

≪ Previous: long double operators and mathimf overloaded functions

I am trying to build a project with ICC and it fails with atleast one of the files. I have attached the pre-processed file ? Can you help please ?

Attachment	Size
Download Unified_cpp_js_src6.i	3.49 MB

Zone:

Server

↧

Join the Intel® Parallel Studio XE 2018 Beta program

April 13, 2017, 3:23 pm

Latest and popular articles on Intel Technologies

≫ Next: Optimal Compiler Flags

≪ Previous: ICC fails to build, but GCC works

We would like to invite you to participate in the Intel® Parallel Studio XE 2018 Beta program. In this beta test, you will gain early access to new features and analysis techniques. Try them out, tell us what you love and what to improve, so we can make our products better for you.

Registration is easy. Complete the pre-beta survey, register, and download the beta software:
Intel® Parallel Studio XE 2018 Pre-Beta survey

The 2018 version brings together exciting new technologies along with improvements to Intel’s existing software development tools:

Modernize Code for Performance, Portability and Scalability on the Latest Intel® Platforms

Use fast Intel® Advanced Vector Extensions 512 (Intel®AVX-512) instructions on Intel® Xeon® and Intel®Xeon® Phi™ processors and coprocessors
Intel® Advisor - Roofline finds high impact, but under optimized loops
Intel® Distribution for Python* - Faster Python* applications
Stay up-to-date with the latest standards and IDE:
- C++2017 draft parallelizes and vectorizes C++ easily using Parallel STL*
- Full Fortran* 2008, Fortran 2015 draft
- OpenMP* 5.0 draft, Microsoft Visual Studio* 2017
Accelerate MPI applications with Intel® Omni-Path Architecture

Flexibility for Your Needs

Application Snapshot - Quick answers: Does my hybrid code need optimization?
Intel® VTune™ Amplifier – Profile private clouds with Docker* and Mesos* containers, Java* daemons

And much more…
For more details about this beta program, a FAQ, and What’s New, visit: Intel® Parallel Studio XE 2018 Beta page.

As a highly-valued customer and beta tester, we welcome your feedback to our development teams via this program at our Online Service Center.

↧

Optimal Compiler Flags

April 16, 2017, 7:58 am

Latest and popular articles on Intel Technologies

≫ Next: Installing older versions of a product

≪ Previous: Join the Intel® Parallel Studio XE 2018 Beta program

I see -O2 says "Optimize for maximum speed". In addition, there are also different arch flags for different instruction sets. Would O2 be the master flags that builds the most optimal code across all the platforms ? Essentially, I am looking to generate the most optimal code across different architectures (code size is not an issue).

↧

Installing older versions of a product

April 16, 2017, 4:28 pm

Latest and popular articles on Intel Technologies

≫ Next: Missing vectorization with large, constant, loop trip counts

≪ Previous: Optimal Compiler Flags

I have a non-commercial license for Parallel Studio XE Professional Edition for Fortran and C++ Linux. I have previously installed older versions of the product with earlier licenses, which have expired. Now I am moving to a new computer. I can successfully install the latest version of the product, but when I attempt to install earlier versions of the product I am told that:

"The serial number you provided is not valid for this product."

Yet since I test multiple versions of the product I need to install not only the latest version but earlier versions. What can I do ?

I also have a license for Parallel Studio XE Professional Edition for C++ Windows. Yet here I was able to install earlier versions of the product for my testing without any problems. So I do not understand why I cannot install earlier versions of my product using the latest license under Linux.

↧

Missing vectorization with large, constant, loop trip counts

April 15, 2017, 1:44 pm

Latest and popular articles on Intel Technologies

≫ Next: Help with icpc and makefile values in win32.mak

≪ Previous: Installing older versions of a product

Hello,

icc 17 does not vectorize the following simple code:

int main(int argc, char const *argv[])
{
    #define SIZE 1024*1024*1024ULL

    unsigned long i;
    float *A,*B,*C;

    A = malloc(sizeof(float) * SIZE);
    B = malloc(sizeof(float) * SIZE);
    C = malloc(sizeof(float) * SIZE);

    for (i=0; i<SIZE; i++)
        C[i] = A[i] + B[i];

    printf("done %ld %f\n", SIZE, C[i]);

    return 0;
}

The vectorization report states:

LOOP BEGIN at main.c(12,5)
   remark #15523: loop was not vectorized: loop control variable i was found, but loop iteration count cannot be computed before executing the loop
   remark #25478: While Loop Unrolled by 2
LOOP END

, which does not make sense and looks like a bug. Reducing the SIZE value slightly (e.g., to 2^31-1), or removing the U from the integer qualifier at the end of the define (i.e., from ULL to LL) are workarounds.

Is this a known problem in icc? Are there any better workarounds?

Best regards,

↧

Help with icpc and makefile values in win32.mak

April 16, 2017, 12:43 pm

Latest and popular articles on Intel Technologies

≫ Next: 'identifier "L__FUNCTION__" is undefined' error for ICC build on Windows

≪ Previous: Missing vectorization with large, constant, loop trip counts

I just intergrated the intel compiler to Visual Studio 2015 works fine with all my projects which are MFC C++ except one which is a C makefile

I think I need to override the value for cc = icpc in win32.mak which I did but then there the default options can anyone help with this

Thanks

Zone:

Windows*

Thread Topic:

How-To

↧

'identifier "LFUNCTION" is undefined' error for ICC build on Windows

April 17, 2017, 8:16 am

Latest and popular articles on Intel Technologies

≫ Next: OpenMP collapsed for with non-const values

≪ Previous: Help with icpc and makefile values in win32.mak

Hi, everyone,

For MESA build using ICC on Windows got error:

CC="icl" CXX="icl" LD="xilink" scons.py -j1 build=debug verbose=yes machine=x86_64 platform=windows MSVC_VERSION=14.0 libgl-gdi
[snip]
icl /Fobuild\windows-x86_64-debug\compiler\glsl_types.obj /c src\compiler\glsl_types.cpp /TP /nologo /Od /Oi /Oy- /W3 /wd4018 /wd4056 /wd4244 /wd4267 /wd4305 /wd4351 /wd4756 /wd4800 /wd4996 /MTd -FIinttypes.h /LDd /D__STDC_CONSTANT_MACROS /D__STDC_FORMAT_MACROS /D__STDC_LIMIT_MACROS /DHAVE_NO_AUTOCONF /DDEBUG /DWIN32 /D_WINDOWS /D_WIN32_WINNT=0x0601 /DWINVER=0x0601 /DVC_EXTRALEAN /D_USE_MATH_DEFINES /D_CRT_SECURE_NO_WARNINGS /D_CRT_SECURE_NO_DEPRECATE /D_SCL_SECURE_NO_WARNINGS /D_SCL_SECURE_NO_DEPRECATE /D_ALLOW_KEYWORD_MACROS /D_HAS_EXCEPTIONS=0 /D_DEBUG /DPIPE_SUBSYSTEM_WINDOWS_USER /DPACKAGE_VERSION=\"17.1.0-devel\" /DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" /Ibuild\windows-x86_64-debug\compiler /Isrc\compiler /Ibuild\windows-x86_64-debug\compiler\nir /Isrc\compiler\nir /Ibuild\windows-x86_64-debug\compiler /Isrc\compiler /Ibuild\windows-x86_64-debug\compiler\glsl /Isrc\compiler\glsl /Iinclude /Isrc /Isrc\mapi /Isrc\mesa /Isrc\gallium\include /Isrc\gallium\auxiliary /Iinclude /Isrc\gallium\include /Isrc\gallium\auxiliary /Isrc\gallium\drivers /Isrc\gallium\winsys /Zi
icl: command line warning #10148: option '/Oy-' not supported
glsl_types.cpp
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\exception(126): error: identifier "L__FUNCTION__" is undefined
                _RAISE(*this);  // raise this exception
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\exception(155): error: identifier "L__FUNCTION__" is undefined
                _RAISE(*this);
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\exception(183): error: identifier "L__FUNCTION__" is undefined
                _RAISE(*this);
                ^

src\compiler\glsl_types.cpp(755): warning #1011: missing return statement at end of non-void function "glsl_type::get_sampler_instance"
  }
  ^

src\compiler\glsl_types.cpp(854): warning #1011: missing return statement at end of non-void function "glsl_type::get_image_instance"
  }
  ^

compilation aborted for src\compiler\glsl_types.cpp (code 2)
scons: *** [build\windows-x86_64-debug\compiler\glsl_types.obj] Error 2
scons: building terminated because of errors.

which relate to code

#include <new>

in file

src/compiler/glsl/glsl_symbol_table.h

If this code removed, build continue until next error:

CC="icl" CXX="icl" LD="xilink" scons.py -j1 build=debug verbose=yes machine=x86_64 platform=windows MSVC_VERSION=14.0 libgl-gdi
[snip]
icl /Fobuild\windows-x86_64-debug\mesa\state_tracker\st_glsl_to_tgsi.obj /c src\mesa\state_tracker\st_glsl_to_tgsi.cpp /TP /nologo /Od /Oi /Oy- /W3 /wd4018 /wd4056 /wd4244 /wd4267 /wd4305 /wd4351 /wd4756 /wd4800 /wd4996 /MTd -FIinttypes.h /LDd /D__STDC_CONSTANT_MACROS /D__STDC_FORMAT_MACROS /D__STDC_LIMIT_MACROS /DHAVE_NO_AUTOCONF /DDEBUG /DWIN32 /D_WINDOWS /D_WIN32_WINNT=0x0601 /DWINVER=0x0601 /DVC_EXTRALEAN /D_USE_MATH_DEFINES /D_CRT_SECURE_NO_WARNINGS /D_CRT_SECURE_NO_DEPRECATE /D_SCL_SECURE_NO_WARNINGS /D_SCL_SECURE_NO_DEPRECATE /D_ALLOW_KEYWORD_MACROS /D_HAS_EXCEPTIONS=0 /D_DEBUG /DPIPE_SUBSYSTEM_WINDOWS_USER /DPACKAGE_VERSION=\"17.1.0-devel\" /DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" /D_GDI32_ /DBUILD_GL32 /D_GLAPI_NO_EXPORTS /Iinclude /Isrc\gallium\include /Isrc\gallium\auxiliary /Isrc\gallium\drivers /Isrc\gallium\winsys /Ibuild\windows-x86_64-debug /Isrc /Ibuild\windows-x86_64-debug\compiler\nir /Isrc\compiler\nir /Ibuild\windows-x86_64-debug\compiler\glsl /Isrc\compiler\glsl /Isrc /Ibuild\windows-x86_64-debug\mapi /Isrc\mapi /Ibuild\windows-x86_64-debug\mesa /Isrc\mesa /Ibuild\windows-x86_64-debug\mesa\main /Isrc\mesa\main /Isrc\gallium\include /Isrc\gallium\auxiliary /Zi
icl: command line warning #10148: option '/Oy-' not supported
st_glsl_to_tgsi.cpp
../build/src/mesa/main/shaderobj.h(138): warning #1011: missing return statement at end of non-void function "_mesa_shader_enum_to_shader_stage"
  }
  ^

../build/src/mesa/main/shaderobj.h(181): warning #1011: missing return statement at end of non-void function "_mesa_shader_stage_from_subroutine_uniform"
  }
  ^

../build/src/mesa/main/shaderobj.h(201): warning #1011: missing return statement at end of non-void function "_mesa_shader_stage_from_subroutine"
  }
  ^

../build/src/mesa/main/shaderobj.h(221): warning #1011: missing return statement at end of non-void function "_mesa_shader_stage_to_subroutine"
  }
  ^

../build/src/mesa/main/shaderobj.h(241): warning #1011: missing return statement at end of non-void function "_mesa_shader_stage_to_subroutine_uniform"
  }
  ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\exception(126): error: identifier "L__FUNCTION__" is undefined
                _RAISE(*this);  // raise this exception
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\exception(155): error: identifier "L__FUNCTION__" is undefined
                _RAISE(*this);
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\exception(183): error: identifier "L__FUNCTION__" is undefined
                _RAISE(*this);
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(70): error: identifier "L__FUNCTION__" is undefined
                _SCL_SECURE_ALWAYS_VALIDATE(_Ptr_container != 0);
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(84): error: identifier "L__FUNCTION__" is undefined
                _SCL_SECURE_ALWAYS_VALIDATE(_Ptr != 0);
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(94): error: identifier "L__FUNCTION__" is undefined
        _SCL_SECURE_ALWAYS_VALIDATE(_Count <= (size_t)(-1) / _Sz);
        ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(99): error: identifier "L__FUNCTION__" is undefined
                _SCL_SECURE_ALWAYS_VALIDATE(
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(108): error: identifier "L__FUNCTION__" is undefined
                _SCL_SECURE_ALWAYS_VALIDATE(
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(114): error: identifier "L__FUNCTION__" is undefined
                _SCL_SECURE_ALWAYS_VALIDATE(_Ptr_container < _Ptr_user);
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(117): error: identifier "L__FUNCTION__" is undefined
                _SCL_SECURE_ALWAYS_VALIDATE(2 * sizeof(void *)
                ^

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\xmemory0(125): error: identifier "L__FUNCTION__" is undefined
                _SCL_SECURE_ALWAYS_VALIDATE(_Ptr_user - _Ptr_container
                ^

compilation aborted for src\mesa\state_tracker\st_glsl_to_tgsi.cpp (code 2)
scons: *** [build\windows-x86_64-debug\mesa\state_tracker\st_glsl_to_tgsi.obj] Error 2
scons: building terminated because of errors.

which relate to code

#include <algorithm>

in file

src/mesa/state_tracker/st_glsl_to_tgsi.cpp

If this code removed, build finishes with error:

CC="icl" CXX="icl" LD="xilink" scons.py -j1 build=debug verbose=yes machine=x86_64 platform=windows MSVC_VERSION=14.0 libgl-gdi
[snip]
icl /Fobuild\windows-x86_64-debug\mesa\state_tracker\st_glsl_to_tgsi.obj /c src\mesa\state_tracker\st_glsl_to_tgsi.cpp /TP /nologo /Od /Oi /Oy- /W3 /wd4018 /wd4056 /wd4244 /wd4267 /wd4305 /wd4351 /wd4756 /wd4800 /wd4996 /MTd -FIinttypes.h /LDd /D__STDC_CONSTANT_MACROS /D__STDC_FORMAT_MACROS /D__STDC_LIMIT_MACROS /DHAVE_NO_AUTOCONF /DDEBUG /DWIN32 /D_WINDOWS /D_WIN32_WINNT=0x0601 /DWINVER=0x0601 /DVC_EXTRALEAN /D_USE_MATH_DEFINES /D_CRT_SECURE_NO_WARNINGS /D_CRT_SECURE_NO_DEPRECATE /D_SCL_SECURE_NO_WARNINGS /D_SCL_SECURE_NO_DEPRECATE /D_ALLOW_KEYWORD_MACROS /D_HAS_EXCEPTIONS=0 /D_DEBUG /DPIPE_SUBSYSTEM_WINDOWS_USER /DPACKAGE_VERSION=\"17.1.0-devel\" /DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\" /D_GDI32_ /DBUILD_GL32 /D_GLAPI_NO_EXPORTS /Iinclude /Isrc\gallium\include /Isrc\gallium\auxiliary /Isrc\gallium\drivers /Isrc\gallium\winsys /Ibuild\windows-x86_64-debug /Isrc /Ibuild\windows-x86_64-debug\compiler\nir /Isrc\compiler\nir /Ibuild\windows-x86_64-debug\compiler\glsl /Isrc\compiler\glsl /Isrc /Ibuild\windows-x86_64-debug\mapi /Isrc\mapi /Ibuild\windows-x86_64-debug\mesa /Isrc\mesa /Ibuild\windows-x86_64-debug\mesa\main /Isrc\mesa\main /Isrc\gallium\include /Isrc\gallium\auxiliary /Zi
icl: command line warning #10148: option '/Oy-' not supported
st_glsl_to_tgsi.cpp
[snip]
src\mesa\state_tracker\st_glsl_to_tgsi.cpp(6138): error: namespace "std" has no member "sort"
     std::sort(decls, decls + count, sorter);
          ^

compilation aborted for src\mesa\state_tracker\st_glsl_to_tgsi.cpp (code 2)
scons: *** [build\windows-x86_64-debug\mesa\state_tracker\st_glsl_to_tgsi.obj] Error 2
scons: building terminated because of errors.

which, in general, isn't strange.

Environment:

Windows 10 x64,
IPSXE 2017 Update 2,
VS 2015 Update 3,
Python 2.7.13 + SCons 2.5.1,
MSYS2 20161025,
MESA 17.1.0-dev (git://anongit.freedesktop.org/git/mesa/mesa ).

Reproduced for ICC Debug builds only. Not reproduced for ICC Release and MSVC <Debug|Release> builds.

The workaround for this issue is to replace definitions 'DEBUG' and '_DEBUG' with 'NDEBUG' and '_NDEBUG', and compiler key '/MTd' with '/MT' or '/MD' (patch added in attachment).

MESA Developers thinks that this issue doesn't relate to MESA. In this case it turns out that it's somewhere between ICC and MSVC. Is it possible to register this error on internal bugtracker in order to fix it in future ICC builds?

P.S. In order to reproduce error, need to remove an incompatible compiler key '/GL' from SCons MSVC toolchain (patch added in attachment).

Alexander

Attachment	Size
Download mesa_icc_patches.zip	938 bytes

Zone:

Windows*

Thread Topic:

Bug Report

↧

OpenMP collapsed for with non-const values

April 17, 2017, 12:38 pm

Latest and popular articles on Intel Technologies

≫ Next: Macro to indicate OpenMP SIMD support

≪ Previous: 'identifier "L__FUNCTION__" is undefined' error for ICC build on Windows

I have this code:

    std::vector<Wrapper> localWrappers;
    std::vector<float> pixelDistancesNew;
    std::vector<float> curSigmas;
    //fill the 3 vectors
    #pragma omp parallel for collapse(2) schedule(dynamic, 1)
    for(int i=0; i<localWrappers.size(); i++)
      for (int r = par.border; r < (localWrappers[i].cur.rows - par.border); r++)
        for (int c = par.border; c < (localWrappers[i].cur.cols - par.border); c++) {
          const float val = localWrappers[i].cur.at<float>(r,c);
          if ( (val > positiveThreshold && (isMax(val, localWrappers[i].cur, r, c) && isMax(val, localWrappers[i].low, r, c) && isMax(val, localWrappers[i].high, r, c))) ||
            (val < negativeThreshold && (isMin(val, localWrappers[i].cur, r, c) && isMin(val, localWrappers[i].low, r, c) && isMin(val, localWrappers[i].high, r, c))) )
            // either positive -> local max. or negative -> local min.
            localizeKeypoint(r, c, curSigmas[i], pixelDistancesNew[i], localWrappers[i]);
        }

And I get this error:

    error: parallel loops with collapse must be perfectly nested
          for(int i=0; i<localWrappers.size(); i++)
                  ^

Error: the OpenMP "single" pragma must not be enclosed by the "for" pragma

Reading [this](http://dev-archive.ambermd.org/201509/0004.html), I think that the error is given by the fact that we I'm using `size()`. However, I don't know how could I get `const` for this or implementing any kind of solution for this problem.

Can someone help me with this?

This is Wrapper definition:

struct Wrapper{

    Wrapper(const SIFTDescriptorParams &sp, const AffineShapeParams ap) :
        sift(sp),
        ap(ap),
        sp(sp),
        mask(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1),
        img(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1),
        fx(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1),
        fy(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1)
    {
        computeGaussMask(mask);
        patch = cv::Mat(ap.patchSize, ap.patchSize, CV_32FC1);
        fx = cv::Scalar(0);
        fy = cv::Scalar(0);
        descriptors.reserve(500);
    }

    AffineShapeParams ap;
    SIFTDescriptorParams sp;

    cv::Mat1f descriptors;
    cv::Mat patch;
    std::vector<Keypoint> keys;

    cv::Mat high;
    cv::Mat prevBlur;
    cv::Mat blur;
    cv::Mat low;
    cv::Mat cur;
    SIFTDescriptor sift;

    cv::Mat mask, img, fx, fy;

   std::vector<unsigned char> workspace;

}

↧

Macro to indicate OpenMP SIMD support

April 17, 2017, 1:44 pm

Latest and popular articles on Intel Technologies

≫ Next: icc 2016.1.150 optimization bug

≪ Previous: OpenMP collapsed for with non-const values

I'm planning on adding OpenMP SIMD pragmas to a project of mine, but I don't really need the full OpenMP; the -openmp-simd option is perfect for me.

The problem is that _OPENMP is (correctly) only defined for full OpenMP; when only -openmp-simd is used there doesn't seem to be any way to detect that the compiler supports it. It would be nice if there were a macro (_OPENMP_SIMD seems logical) to tell if omp simd pragmas are supported. I'd love to be able to do something like

#if (defined(_OPENMP) && (_OPENMP >= 201307L)) || (defined(_OPENMP_SIMD) && (_OPENMP_SIMD >= 201307L))
#pragma omp simd ...
#endif

I'd probably wrap it up in a macro, but that's beside the point. As it stands the best I can do is pass something like -DOPENMP_SIMD from the build system, which isn't terrible, but it would be nice to have something a bit more standard so the code is easier to copy around across projects.

FWIW, gcc doesn't have anything for -fopenmp-simd either, but I figured I'd start with ICC since that's where the idea for -fopenmp-simd originated.

↧

icc 2016.1.150 optimization bug

April 18, 2017, 3:04 pm

Latest and popular articles on Intel Technologies

≫ Next: Is icc 2017 could change no simd code to simd code automatic

≪ Previous: Macro to indicate OpenMP SIMD support

I think I found a bug in intel c compiler. This is how to reproduce it:

First file (cstuff.c)

#include <stdio.h>
#include <string.h>

#include <netdb.h>
#include <sys/utsname.h>
#include <sys/socket.h>

void uninf(char sy[],long *l)
{
  struct hostent *h ;
  struct utsname inf ;
  char lsy[1000];
  int ll ;

  uname(&inf) ;
  if ( (h = gethostbyname(inf.nodename)) == NULL) {
    printf("gethostbyname>hostname problem: ");
    printf("check /etc/hosts or /etc/resolv.conf\n");
    sprintf(lsy,"%s-%s(%s)", inf.sysname, inf.release, inf.machine);
  } else {
    sprintf(lsy,"%s-%s(%s)@%s", inf.sysname, inf.release, inf.machine,
        h->h_name);
  }
  //ll = strlen(inf.sysname)+strlen(inf.release)+strlen(inf.machine)+strlen(h->h_name)+4;
  printf("%s\n", lsy);
  ll = (int) strlen(lsy) ;
  if (ll > 80) ll = 80;
  *l = ll ;
  //*l = ( *l > 44 ) ? 44 : *l ;
  //ll = *l ;
  strncpy(sy,lsy,ll);
}

Second file: (teststr.c)

#include <stdio.h>
#include <string.h>
void uninf(char sy[], long *l);
int main(int argc, char **argv) {

  char s[120], s2[120];
  long i;
  uninf(s, &i);
  strncpy(s2, s, i);
  s2[i+1] = '\0';
  printf("%s\n", s2);

  return 0;

}

Compile both files:

icc -g -O3 -c cstuff.c

icc -g -O3 -c teststr.c

and link them:

icc -g cstuff.o teststr.o -o strtest

The program strtest causes a SEGILL error on certain machines (compiled on Redhat Enterprise Linux server 6.8, kernel version 2.6.32-642.3.2.e16.x86_64; running on Redhat ELS 7.0 Maipo with kernel 3.10.0-123.e17.x86_64)

While it does not cause segfault on all machines, running strtest in gdb:

(gdb) start
Temporary breakpoint 1 at 0x400af8: file teststr.c, line 8.
Starting program: /users/dzhang/tmp/strlenbug/fail.exe

Temporary breakpoint 1, main (argc=0, argv=0x3) at teststr.c:8
8	  uninf(s, &i);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-55.el7.x86_64 libgcc-4.8.2-16.el7.x86_64
(gdb) s
4	int main(int argc, char **argv) {
(gdb) s
8	  uninf(s, &i);
(gdb) s
uninf (sy=0x7fffffffe300 "\b\342\377\367\377\177", l=0x7fffffffe3f0)
    at cstuff.c:9
9	{
(gdb) s
15	  uname(&inf) ;
(gdb) s
16	  if ( (h = gethostbyname(inf.nodename)) == NULL) {
(gdb) s
21	    sprintf(lsy,"%s-%s(%s)@%s", inf.sysname, inf.release, inf.machine,
(gdb) p sy
$1 = 0x7ffff78c1e20 <resbuf.11716> "\305A`"
(gdb) s
25	  printf("%s\n", lsy);
(gdb) p sy
$2 = 0x0

The address of sy changes to 0x0 at the end. Compile with gcc -g -O3 or icc -g -O0 does not show this behavior.

↧

Is icc 2017 could change no simd code to simd code automatic

April 19, 2017, 5:22 am

Latest and popular articles on Intel Technologies

≫ Next: C++ beta installation TBB mandatory....

≪ Previous: icc 2016.1.150 optimization bug

My company bought parallel studio xe 2017 composer edition for cpp last month. I build image process module with new icc. we last use icc bought in 2011. As the same using i3 cpu, I find CPU occupancy is lower than before. With new icc, the cpu occupancy of cpu1 could reduce 24%, cpu2 could reduce about 15%,cpu0 and cpu3 without any change. We only use cpu1 and cpu2 for image processing.

Is new icc could change no simd code to sse2 or avx to optimize the image processing speed?

Thanks.

↧

C++ beta installation TBB mandatory....

April 19, 2017, 10:38 am

Latest and popular articles on Intel Technologies

≫ Next: libifcore missing in the newest version of C++ compiler (2017.2.044)

≪ Previous: Is icc 2017 could change no simd code to simd code automatic

I lost several hours on undocumented aspects of the beta installation.

It is possible to select C++ while deselecting TBB. This will cause the C++ (and possibly Fortran et al.) to fail.

There is no support for MIC KNC in the beta (even for linux), although KNL is supported.

On both Windows 8.1 and 10, Software Assistant failed at the end of installation, but this seems to be associated with missing documentation and doesn't affect usage otherwise.

Zone:

↧

libifcore missing in the newest version of C++ compiler (2017.2.044)

April 20, 2017, 3:00 am

Latest and popular articles on Intel Technologies

≫ Next: Compiler options

≪ Previous: C++ beta installation TBB mandatory....

Hello,

I am trying to compile a C++ code and need to use the libifcore library. However, it is not contained in my version (2017.2.044) of the Intel System Studio. I discovered, that in slightly less recent versions (e.g. 2017.1.132) that library was still contained.

Has it just been renamed now? If yes, what is it called now? If not - since I do need that library file, where can I get it?

Hopefully you can help me with this!

Thank you,

Laura

Thread Topic:

Question

↧

Compiler options

April 20, 2017, 7:43 am

Latest and popular articles on Intel Technologies

≫ Next: Same compile works with cc = cl (microsoft) Fails with cc = icl

≪ Previous: libifcore missing in the newest version of C++ compiler (2017.2.044)

I am getting the following error so my question is there an intel compiler option which would give better diagnostic i.e. say where size_t was originally defined

Thanks

C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\INCLUDE\crtdefs.h(494): error : invalid redeclaration of type name "size_t"

1> typedef unsigned __int64 size_t;

Zone:

Windows*

Thread Topic:

How-To

↧

Same compile works with cc = cl (microsoft) Fails with cc = icl

April 20, 2017, 11:25 am

Latest and popular articles on Intel Technologies

≫ Next: Ilegal instruccion on Xeon E5472

≪ Previous: Compiler options

If there is something wrong with my includes regarding my re-definition of size_t then how come I compile cc = cl (visual studio)

and Fail with intel cc = icl I didn't change any of the compiler options but should that be an issue

Thanks

Zone:

Windows*

Thread Topic:

Help Me

↧

Ilegal instruccion on Xeon E5472

April 20, 2017, 2:04 pm

Latest and popular articles on Intel Technologies

≫ Next: unroll_and_jam pragma ignored but no reason specified

≪ Previous: Same compile works with cc = cl (microsoft) Fails with cc = icl

Hi all,

I'm compiling this tool called XIOS for parallel I/O operations,

When i compile it with all its dependencies (HDF5, Netcdf) , and running the test, I get the following error:

forrtl: severe (168): Program Exception - illegal instruction
Image              PC                Routine            Line        Source
xios_server.exe    0000000000A69351  Unknown               Unknown  Unknown
xios_server.exe    0000000000A6748B  Unknown               Unknown  Unknown
xios_server.exe    0000000000A145D4  Unknown               Unknown  Unknown
xios_server.exe    0000000000A143E6  Unknown               Unknown  Unknown
xios_server.exe    00000000009EDC69  Unknown               Unknown  Unknown
xios_server.exe    00000000009F14C4  Unknown               Unknown  Unknown
libpthread-2.12.s  000000370640F710  Unknown               Unknown  Unknown
xios_server.exe    0000000000A74F58  Unknown               Unknown  Unknown
xios_server.exe    0000000000830547  Unknown               Unknown  Unknown
xios_server.exe    00000000007A1B93  Unknown               Unknown  Unknown
xios_server.exe    00000000007A1D8A  Unknown               Unknown  Unknown
xios_server.exe    00000000007A892F  Unknown               Unknown  Unknown
xios_server.exe    000000000074BB6D  Unknown               Unknown  Unknown
xios_server.exe    0000000000700646  Unknown               Unknown  Unknown
xios_server.exe    000000000070152D  Unknown               Unknown  Unknown
xios_server.exe    000000000067011F  Unknown               Unknown  Unknown
xios_server.exe    0000000000640B95  Unknown               Unknown  Unknown
xios_server.exe    000000000063E464  _ZN4xios9CONetCDF          15  onetcdf4.cpp
xios_server.exe    00000000005A705C  _ZN4xios14CNc4Dat          31  nc4_data_output.cpp
xios_server.exe    00000000005A7001  _ZN4xios14CNc4Dat          28  nc4_data_output.cpp
xios_server.exe    0000000000527469  _ZN4xios5CFile12c         297  file.cpp
xios_server.exe    0000000000526A9D  _ZN4xios5CFile9ch         180  file.cpp
xios_server.exe    000000000051A266  _ZN4xios6CField10         192  field.cpp
xios_server.exe    000000000051A032  _ZN4xios6CField14         182  field.cpp
xios_server.exe    0000000000519449  _ZN4xios6CField14         139  field.cpp
xios_server.exe    0000000000519159  _ZN4xios6CField13          84  field.cpp
xios_server.exe    00000000004E94FF  _ZN4xios14CContex         196  context_server.cpp
xios_server.exe    00000000004E9277  _ZN4xios14CContex         155  context_server.cpp
xios_server.exe    00000000004E857F  _ZN4xios14CContex          50  context_server.cpp
xios_server.exe    00000000006433F9  _ZN4xios7CServer1         379  server.cpp
xios_server.exe    000000000064326B  _ZN4xios7CServer9         161  server.cpp
xios_server.exe    0000000000644DF7  _ZN4xios7CServer1         126  server.cpp
xios_server.exe    00000000004EB390  _ZN4xios5CXios14i          72  cxios.cpp
xios_server.exe    000000000040CA2C  MAIN__                      7  xios_server.f90
xios_server.exe    0000000000A1E7C2  Unknown               Unknown  Unknown
libc-2.12.so       0000003705C1ED5D  __libc_start_main     Unknown  Unknown
xios_server.exe    000000000040C929  Unknown               Unknown  Unknown

My operating system is Centos 6.6. The login node in which I'm compiling its: a

Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz

The compute nodes are:

Intel(R) Xeon(R) CPU           E5472  @ 3.00GHz
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi flexpriority

I'm compiling all the dependencies with the following directives:

-O3 -xSSE4.1 -fp-model source

I'd like to know if there is anything wrong with my setup , i'll appreciate it.

This code used to work with an older version of the intel compilers,

FavioMJ

↧