#pragma omp cancel for compilation error

April 19, 2018, 7:50 am

Latest and popular articles on Intel Technologies

≫ Next: Join the Intel® Parallel Studio XE 2019 Beta Program today

≪ Previous: Intel C++ Compiler 18 update 2 MacOS thread local compilation failure

The following code fails to compile with

error : cancel for must be closely nested in a for region.

I am at a loss here, as this code seems to follow the example in OpenMP 4.0 examples PDF

void causes_an_exception()
{
	throw std::logic_error("exception");
}

void example()
{
	std::exception *ex = NULL;
	bool cancelled = false, failed = false;
	int N = 100000;
#pragma omp parallel for
	for (int i = 0; i < N; i++) {
		// no 'if' that prevents compiler optimizations
		try {
			causes_an_exception();
		}
		catch (std::exception *e) {
			// still must remember exception for later handling
#pragma omp atomic write
			ex = e;
			// cancel worksharing construct
#pragma omp cancel for
		}
	}
}

↧

Join the Intel® Parallel Studio XE 2019 Beta Program today

April 19, 2018, 4:41 pm

Latest and popular articles on Intel Technologies

≫ Next: Compatibility with GCC 7 + glibc 2.27

≪ Previous: #pragma omp cancel for compilation error

Join the Intel® Parallel Studio XE 2019 Beta Program today and—for a limited time—get early access to new features and get an open invitation to tell us what you really think.

We want YOU to tell us what to improve so we can create high-quality software tools that meet your development needs.

Top New Features in Intel® Parallel Studio XE 2019 Beta

Scale and perform on the path to exascale. Enable greater scalability and improve latency with the latest Intel^® MPI Library.
Get better answers with less overhead. Focus more fully on useful data, CPU utilization of physical cores, and more using new data-selection support from Intel^® VTune^™ Amplifier’s Application Performance Snapshot.
Visualize parallelism. Interactively build, validate, and visualize algorithms using Intel® Advisor’s Flow Graph Analyzer.
Stay up-to-date with the latest standards:
- Expanded C++17 and Fortran 2018 support
- Full OpenMP* 4.5 and expanded OpenMP* 5.0 [DRAFT] support
- Python* 3.6 and 2.7

New Features in Intel® Fortran Compiler

Inclusive scan [Intel extension to OpenMP* SIMD]
Support for new clauses dynamic_align and vectorlength for #pragma vector
Support for structured bindings, fold expressions and other C++17 features

To learn more, visit Intel^® Parallel Studio XE 2019 Beta page.

Then sign up to get started.

↧

Compatibility with GCC 7 + glibc 2.27

April 20, 2018, 5:51 am

Latest and popular articles on Intel Technologies

≫ Next: Parallel STL: Controlling the number of threads?

≪ Previous: Join the Intel® Parallel Studio XE 2019 Beta Program today

Hello,

I'm using the Intel Compiler 2018 Update 2 on Linux x86_64 (Arch Linux). Arch Linux's default compiler and glibc versions are GCC 7.3.1 and glibc 2.27 at the moment. In

https://software.intel.com/en-us/comment/1915553#

I reported an incompatibility of ICC 17 when used with GCC 7 + glibc 2.26. This incompatibility is now fixed, but similar problems have now arisen with glibc 2.27. glibc 2.27 adds a few function declarations for the _Float32, _Float32x, _Float64, _Float64x, ... types in case the corresponding __HAVE_FLOATXYZ macros are defined, for example:

#if __HAVE_FLOAT32 && __GLIBC_USE (IEC_60559_TYPES_EXT) 
extern _Float32 strtof32 (const char *__restrict __nptr, 
                         char **__restrict __endptr)
    __THROW __nonnull ((1));
#endif

This fails to compile with Intel Compiler 2018 Update 2, when using GCC 7. GCC 7 is needed because one of its headers will define the __HAVE_FLOATXYZ macros (see https://gcc.gnu.org/gcc-7/changes.html, search for Float32).

Another unrelated incompatibility with glibc 2.27 is this part in the Intel Compiler math.h:

#if (!defined(__linux__) || !defined(__USE_MISC)) && !defined(__NetBSD__) || defined (__PURE_INTEL_C99_HEADERS__)
typedef enum ___LIB_VERSIONIMF_TYPE {
     _IEEE_ = -1    /\* IEEE-like behavior    *\/
    ,_SVID_         /\* SysV, Rel. 4 behavior *\/
    ,_XOPEN_        /\* Unix98                *\/
    ,_POSIX_        /\* Posix                 *\/
    ,_ISOC_         /\* ISO C9X               *\/
} _LIB_VERSIONIMF_TYPE;
#else
# define _LIB_VERSIONIMF_TYPE _LIB_VERSION_TYPE
#endif

_LIBIMF_EXTERN_C _LIB_VERSIONIMF_TYPE _LIBIMF_PUBVAR _LIB_VERSIONIMF;

This part of math.h causes a compilation failure because glibc 2.27 removed the macro '_LIB_VERSION_TYPE' (https://sourceware.org/git/?p=glibc.git;a=commit;h=813378e9fe17e029caf62...).

I realize that the current version of the compiler does not officially support glibc 2.27, but it would be nice to have this fixed in a future release, which is why I'm reporting these incompatibilities here.

↧

Parallel STL: Controlling the number of threads?

April 20, 2018, 11:12 am

Latest and popular articles on Intel Technologies

≫ Next: FlexLM Floating License Activation over VPN

≪ Previous: Compatibility with GCC 7 + glibc 2.27

Hello All,

So, I wanted to try out some of the new C++17 parallel algorithms and luckily I have access to Parallel Studio 2018. Compiled the following code:

#include <vector>
#include <iostream>
#include "pstl/execution"
#include "pstl/algorithm"
#include "pstl/numeric"
#include "pstl/memory"

int main() {
  
  // 1,000,000 double vector with 1/2
  std::vector<double> v(1'000'000);
  std::fill(std::execution::par_unseq, v.begin(), v.end(), 0.5);

  // Reduction
  auto result = std::reduce(std::execution::par, v.begin(), v.end(), 0.0);
  std::cout << "Result: "<< result << "\n";
  return 0;
}

If I run it through vtune, I see it uses the 12 threads on my machine. I did a little googling, but I don't see how I restrict the number of threads this code uses. I saw some ancient suggestions of TBB_NUM_THREADS but that doesn't appear to change anything. How do I control the number of threads if I wanted to run PSTL code in my HPC scheduler?

Thanks.

Barry

↧

FlexLM Floating License Activation over VPN

April 23, 2018, 9:29 am

Latest and popular articles on Intel Technologies

≫ Next: Help with getting Intel C++ 11.0 or 13.0

≪ Previous: Parallel STL: Controlling the number of threads?

I have a question concerning the FLEX licensing manager.

A client of mine has recently purchased and installed a floating license for C++ Parallel Studio XE Composer 2018 for Windows. The license will activate fine on a computer operating inside the local LAN on the same subnet as the FlexLM server. However, activation consistently fails when trying to activate the license on a remote client PC that is connected to the LAN over a VPN (and consequently resides on a different subnet than the FlexLM server). Is there a whitepaper that discusses this activation scenario that I can refer them to?

Thank you.

Jason

↧

Help with getting Intel C++ 11.0 or 13.0

April 24, 2018, 8:10 am

Latest and popular articles on Intel Technologies

≫ Next: icc with Intel® Parallel Studio XE for Linux

≪ Previous: FlexLM Floating License Activation over VPN

Hi I am trying to run cpu2006 benchmark with intel compilers to evaluate the performance. I have searched through but not able to find a link for earlier versions. Could you kindly post a link please?

↧

icc with Intel® Parallel Studio XE for Linux

April 24, 2018, 1:41 pm

Latest and popular articles on Intel Technologies

≫ Next: EIgen robust Cholesky broken with Intel 18.0.1

≪ Previous: Help with getting Intel C++ 11.0 or 13.0

Hi, I have installed Intel® Parallel Studio XE for Linux which include C++ compiler. I sourced the "compilervars" script. But when I want to use "icc file.c", there is this message error: -bash: icc: command not found. Indeed, when I use "which icc" it indicates that there is no icc but I don't understand why and what I can do to correct this and set my C++ compiler.

↧

EIgen robust Cholesky broken with Intel 18.0.1

April 25, 2018, 7:09 am

Latest and popular articles on Intel Technologies

≫ Next: how to control OMP Thread number between libraries

≪ Previous: icc with Intel® Parallel Studio XE for Linux

Hi,
I have just noticed that the LDLT decomposition is broken with the latest Intel compiler.
The installation I have access to reports for icpc -v:
icpc version 18.0.1 (gcc version 6.4.0 compatibility)

Comparing the solutions for a random SPD matrix obtained with FullPivLU, PartialPivLU, LLT and LDLT shows that the latter is broken. You can find the code for this test on GitHub: https://github.com/robertodr/eigen-cholesky

Intel 18.0.1 results
Linear system solvers comparison
Relative error |Ax - b| / |b|
---------------------------------
FullPivLU    2.74599e-11
PartialPivLU 1.37848e-11
LLT          1.70863e-11
LDLT         7.02994

Error wrt FullPivLU |x - x_flu| / |x_flu|
---------------------------------------------
PartialPivLU 1.91271e-11
LLT 8.91642e-11
LDLT 1.00003

GCC 6.4.0 results
Linear system solvers comparison
Relative error |Ax - b| / |b|
---------------------------------
FullPivLU    2.74599e-11
PartialPivLU 1.37848e-11
LLT          1.54872e-11
LDLT         2.0482e-11

Error wrt FullPivLU |x - x_flu| / |x_flu|
---------------------------------------------
PartialPivLU 1.91271e-11
LLT 3.25784e-11
LDLT 1.47468e-10

The sample works with Intel 17 and 18.0.0 Compiling in debug mode also fixes the issue. The Eigen developers suggested setting -fp-model precise but that didn't help

Thanks in advance for any suggestions.

Roberto

↧

how to control OMP Thread number between libraries

April 26, 2018, 2:28 am

Latest and popular articles on Intel Technologies

≫ Next: Alternatives to _mm_malloc.

≪ Previous: EIgen robust Cholesky broken with Intel 18.0.1

dear all,

I meet a problem when using iomp.

I have a dynamic library A, and export some functions within it, this functions may be use openmp for parallel compute.

then I implement a executable application, and call some functions from library A. this executable application will be called by others by multi threads(also with iomp). so, when it runs, the number of threads are too much.

question is:
how can i control iomp thread numbers functions in A used,but without modify library A?

tks.

↧

Alternatives to _mm_malloc.

April 26, 2018, 5:50 pm

Latest and popular articles on Intel Technologies

≫ Next: Automate Parallel XE 2018 installation for Windows

≪ Previous: how to control OMP Thread number between libraries

Hi,

I am working on a project which has its own memory allocator. As in it gets 400 MB from the system(via one malloc call) and then allocates memory from it. Its written in C (so no placement new operators). I can control the way memory is allocated, by that I mean I can align the memory given to different structures to a certain boundary, say 64. But, I cannot use _mm_malloc. I can use _assume_aligned.
So my question is how do I make the icc compiler see that I have aligned the memory and it should vectorize the for loops with alignment ON. Are there any intrinsic function which I can call to tell the compiler about the alignment of a pointer ? Is there some flag which I need to set to let the compiler see that the memory is aligned ? Or is there any other way to do so. Any pointers will be helpful.

Regards,
Gaurav

↧

Automate Parallel XE 2018 installation for Windows

April 29, 2018, 7:37 pm

Latest and popular articles on Intel Technologies

≫ Next: OpenMP + lambda capture = icc internal error

≪ Previous: Alternatives to _mm_malloc.

Hello,

I'm looking for information on how to install Parallel XE 2018 installation for Windows clients.

So far, i was able to create the offline installation folder using this thread:https://software.intel.com/en-us/forums/intel-c-compiler/topic/594797

I was able to create the "Silent.cfg" file but for some reason when I'm performing the installation, i can see that components that i wasn't selecting are included.

I'm probably missing something on the installation command, does someone has any experience with it?

Thanks!

↧

OpenMP + lambda capture = icc internal error

April 29, 2018, 9:29 am

Latest and popular articles on Intel Technologies

≫ Next: icpc + __int128_t + templates = internal error: 04010002_1504

≪ Previous: Automate Parallel XE 2018 installation for Windows

C++ compiler 18.0.2 is crashing when using OpenMP and C++11 lambda capture.

Here is a minimial crashing example:

template < typename Function >
void call_1( const Function& f )
{
  f();
}

template < typename Function >
void call_2( const Function& f )
{
  f();
}

int main(int argc, char **argv)
{
    double count = 0;
    call_1([&count](){
        #pragma omp parallel
        call_2([&count](){});
      });
}

And here is the output:

$ icc --version
icc (ICC) 18.0.2 20180210
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.
$ icc -qopenmp main.cpp
": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.

icc: error #10105: /opt/intel/compilers_and_libraries_2018.2.199/linux/bin/intel64/mcpcom: core dumped
icc: warning #10102: unknown signal(36194064)
icc: error #10106: Fatal error in /opt/intel/compilers_and_libraries_2018.2.199/linux/bin/intel64/mcpcom, terminated by unknown
compilation aborted for main.cpp (code 1)

Is this a known issue? Should i avoid using lambda capture with OpenMP?

↧

icpc + __int128_t + templates = internal error: 04010002_1504

May 1, 2018, 2:57 am

Latest and popular articles on Intel Technologies

≫ Next: Expected support for Ubuntu 18.04LTS

≪ Previous: OpenMP + lambda capture = icc internal error

Hi,

OS: Ubuntu 18.04 x86_64
Compiler version: icpc (ICC) 18.0.2 20180210

I have run into an icpc compiler bug. Intel's C++ compiler fails to compile template functions which use the non-standard __int128_t or __uint128_t types. I ran into this bug when I tried to compile my primecount program (https://github.com/kimwalisch/primecount) using icpc. I have created a minimal C++ program (see attached bug.cpp) which reproduces the bug.

Steps to reproduce the bug:

$ icpc bug.cpp
bug.cpp(28) (col. 27): internal error: 04010002_1504
compilation aborted for bug.cpp (code 4)

Thanks for your help!

Attachment	Size
Download bug.cpp	485 bytes

↧

Expected support for Ubuntu 18.04LTS

May 1, 2018, 11:39 pm

Latest and popular articles on Intel Technologies

≫ Next: mpicxx using g++ instead of icc

≪ Previous: icpc + __int128_t + templates = internal error: 04010002_1504

Dear all:

Is there an estimate when Ubuntu 18.04LTS will be supported? Our systems have been updated and now compilation with icc fails due to problems with the math.h header

(1230): error: identifier "_LIB_VERSION_TYPE" is undefined
  _LIBIMF_EXTERN_C _LIB_VERSIONIMF_TYPE _LIBIMF_PUBVAR _LIB_VERSIONIMF;

I tried using the -D__PURE_INTEL_C99_HEADERS__ flag but that fails due to other libraries using the system math.h

In file included from /usr/include/boost/math/policies/policy.hpp(29),

                 from /usr/include/boost/math/special_functions/math_fwd.hpp(29),
                 from /usr/include/boost/math/special_functions/sign.hpp(17),
                 from /usr/include/boost/lexical_cast/detail/inf_nan.hpp(34),
                 from /usr/include/boost/lexical_cast/detail/converter_lexical_streams.hpp(63),
                 from /usr/include/boost/lexical_cast/detail/converter_lexical.hpp(54),
                 from /usr/include/boost/lexical_cast/try_lexical_convert.hpp(42),
                 from /usr/include/boost/lexical_cast.hpp(32),
                 from /usr/include/boost/program_options/value_semantic.hpp(14),
                 from /usr/include/boost/program_options/options_description.hpp(13),
                 from /usr/include/boost/program_options.hpp(15),
                 from /home/bbaumeie/votca/src/tools/include/votca/tools/application.h(21),
                 from /home/bbaumeie/votca/src/tools/src/libtools/application.cc(19):
/opt/intel/compilers_and_libraries_2018.2.199/linux/compiler/include/math.h(267): error: invalid redeclaration of type name "float_t" (declared at line 155 of "/usr/include/math.h")
  typedef float   float_t;

↧

mpicxx using g++ instead of icc

May 2, 2018, 1:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Creating a universal binary using Intel compiler

≪ Previous: Expected support for Ubuntu 18.04LTS

Hello,

I've just installed the latest Intel distribution for Fortran, C++, and MPI (2018/2 version). However i typed "mpicxx" in terminal to check and it gave the following message:

This script invokes an appropriate specialized C++ MPI compiler driver.
The following ways (priority order) can be used for changing default
compiler name (g++):
   1. Command line option: -cxx=<compiler_name>
   2. Environment variable: I_MPI_CXX (current value '')
   3. Environment variable: MPICH_CXX (current value '')

It shouldn't use the intel compiler by default? How do I do that?

Thank you.

↧

Creating a universal binary using Intel compiler

May 2, 2018, 11:17 pm

Latest and popular articles on Intel Technologies

≫ Next: No error for declspec(dllexport) in Linux

≪ Previous: mpicxx using g++ instead of icc

In MacOSX, gcc command line accepts multiarchitecture options:

gcc -arch i386 -arch x86_64 etc.

... and creates a universal binary by compiling and linking for both archs and running lipo for gluing them together.

However using this command line with Intel, produces a compiler warning:

command line warning #10121: overriding '-arch i386' with '-arch x86_64'

The other option is to perform all 3 steps on my own, however since I'm using cmake - that forces an awkward workaround.

Am I doing something wrong or is this a compiler limitation?

↧

No error for declspec(dllexport) in Linux

May 7, 2018, 4:37 am

Latest and popular articles on Intel Technologies

≫ Next: DF98 Directory, dfor.lib and Fortran Compiler

≪ Previous: Creating a universal binary using Intel compiler

Hi,

I have below code in Linux which is working fine without any error. Since __declspec(dllexport) and __declspec(dllimport) are windows specific then why compiler is not complaining about it?

code in .hpp is as below.

#ifdef EXPORT_MY_DLL

#define MYEXPORTER __declspec(dllexport)

#else

#define MYEXPORTER __declspec(dllimport)

#endif

MYEXPORTER bool isInitialised();

and code in cpp is as below.

MYEXPORTER bool isInitialised(){

return true;

}

Please help me to understand this as I have mostly worked on windows so keen to understand why there is no warning or error.

Thanks.

↧

DF98 Directory, dfor.lib and Fortran Compiler

May 14, 2018, 3:11 pm

Latest and popular articles on Intel Technologies

≫ Next: Why the same loop is not vectorized and vectorized inside itself?

≪ Previous: No error for declspec(dllexport) in Linux

Hello,

I was working on Visual Studio 2013 project and was using C++ that calls Fortran scripts. However, I keep getting the error:

Error LNK1104: cannot open file 'DFOR.lib' Visual Studio

However, I do not even have the DF98 directory where DFOR.lib should reside. I was trying to get it but think I am missing a Fortran compiler and libraries.

Do I need to download a compiler or are there libraries I can simply get to fix the issue.

Thank you so much,

Gavin

↧

Why the same loop is not vectorized and vectorized inside itself?

May 16, 2018, 10:03 am

Latest and popular articles on Intel Technologies

≫ Next: segfault in using 'omp parallel for simd' with std vector size and no optimization

≪ Previous: DF98 Directory, dfor.lib and Fortran Compiler

I would like to understand why the report of the loop starts with the message "... was not vectorized" and right inside the loop it there's another loop, with the message "loop was vectorized". As I understand, the inner loop is treating itself... Does anyone have a clue?

Is it really that the compiler nested the loop (530, 6) inside itself?

...

         LOOP BEGIN at suktmig2d_OpenMP.c(530,6) inlined into suktmig2d_OpenMP.c(302,3)
            remark #25399: memcopy generated
            remark #15542: loop was not vectorized: inner loop was already vectorized
            remark #25015: Estimate of max trip count of loop=8

            LOOP BEGIN at suktmig2d_OpenMP.c(530,6) inlined into suktmig2d_OpenMP.c(302,3)
               remark #15389: vectorization support: reference datalo[k-?] has unaligned access   [ suktmig2d_OpenMP.c(531,7) ]
               remark #15389: vectorization support: reference *(*(lowpass+nc*8)+(k+?-1)*4) has unaligned access   [ suktmig2d_OpenMP.c(531,21) ]
               remark #15381: vectorization support: unaligned access used inside loop body
               remark #15305: vectorization support: vector length 8
               remark #15309: vectorization support: normalized vectorization overhead 1.000
               remark #15300: LOOP WAS VECTORIZED
               remark #15450: unmasked unaligned unit stride loads: 1 
               remark #15451: unmasked unaligned unit stride stores: 1 
               remark #15475: --- begin vector cost summary ---
               remark #15476: scalar cost: 4 
               remark #15477: vector cost: 0.750 
               remark #15478: estimated potential speedup: 4.000 
               remark #15488: --- end vector cost summary ---
               remark #25015: Estimate of max trip count of loop=3
            LOOP END

            LOOP BEGIN at suktmig2d_OpenMP.c(530,6) inlined into suktmig2d_OpenMP.c(302,3)
            <Remainder loop for vectorization>
               remark #25015: Estimate of max trip count of loop=24
            LOOP END
         LOOP END

...

↧

segfault in using 'omp parallel for simd' with std vector size and no optimization

May 16, 2018, 12:54 pm

Latest and popular articles on Intel Technologies

≫ Next: Why Can't I Compile? Non-Technical.

≪ Previous: Why the same loop is not vectorized and vectorized inside itself?

#include <vector>
#include "omp.h"

int main() {

  int const N = 1000;
  std::vector<double> x(N);
  x.assign(N, 2.0);

  #pragma omp parallel for simd
  for(int i=0; i<x.size(); ++i)  x[i] *= 2.0;

  return 0;
}

Compiling the above code with icpc -g -O0 -qopenmp (or icpc -debug -qopenmp) produces a segmentation fault (the value of i in the for loop for at least one of the threads is a very large size_t). However, compiling with O3 or O2 (even with symbols) runs just fine.

Is this expected behavior for "parallel for simd"? If I remove the simd and just use "#pragma omp parallel for", all levels of optimization work just fine. Additionally, if I use the integer N in the for loop conditional (instead of x.size()), all levels of optimization work just fine. It's only when I use O0 and the size method in the for loop. GCC 5.3 produces correct results for all levels of optimization.

I also tested setting the size at runtime (i.e. pass it into the program) but still use the integer N in the for loop. For that case all levels of optimization work fine. The error only seems to occur when I use the .size method and no optimization. I am using Intel 18.0.1.

↧