Quantcast
Channel: Intel® Software - Intel® C++ Compiler
Viewing all 2797 articles
Browse latest View live

Webinar on Accelerate Application Performance with OpenMP* and SIMD Parallelism

$
0
0

We welcome you all for the webinar tomorrow morning at 9am PST where Martyn Corden is going to show you how to tune a SAD kernel using Intel Software Tools. He will walk thru the steps on how to look for tuning opportunities and optimize the kernel step by step incrementally (also show performance improvement after each step). Please register at https://register.gotowebinar.com/register/5605160025455344642 to attend this webinar.

Thanks and Regards
Anoop


Authorization for MPI compiled program

$
0
0

 if I use the Cluster Edition  compile my program with MPI function, can my users also run the program with MPI function on cluster without installing mpich2 or openmpi compiled by intel fortran.

Zone: 

Intel C++ compiler on HP-UX 11i operating systems

$
0
0

*** Intel C++ compiler on HP-UX 11i operating systems ***

error TRK0005: Failed to locate: "icl.exe". The system cannot find the file specified.

$
0
0

I had just my machine rebuilt, because I could not update the intel software tools.

I installed the parallel studio XE 2017 (basic)

I am using win7, 64bit with VS2010, VS2013 and VS2015.

I received the error:

"error TRK0005: Failed to locate: "icl.exe". The system cannot find the file specified."

in VS2015 (debug, 64bit)

After reading on the web about possible path shortening (a well-known windows artifact, that there was no way to be anticipated or worked-around- I presume!), I decided to fire-up devenv.exe from the command prompt of PSXE 2017, which worked fine!

Now, I would like to be able to not have to go through this every time I start Visual Studio.

There is a bunch of locations in which icl.exe resides:

C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.109\windows\bin\intel64_ia32

C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.109\windows\bin\intel64

and :

C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.109\windows\bin\ia32

which one should I use? It seems -I wish- the first is a wrapper for the other two. Is this correct?

TIA,

Petros

 

 

 

KMP_AFFINITY

$
0
0

Hi,

Does anyone know I can internally change KMP_AFFINITY in the sub-process invoked from my program?  My experiment shows it does not work with intel compiler but however it is under gcc compiler.

here is the example:

let's say I have KMP_AFFINITY=scatter, which is for my main process. Then inside main process before invoking another executable as the sub-process, putenv is used to modify KMP_AFFINITY=none for the sub-process.

is this supposed to work? my run shows the KMP_AFFINITY=none does not apply to the sub-process if intel compiler is used to compile and link my main program. but it is with gcc compiler.

when I double check the environment, in the sub-process, there is one extra environment variable for my exe with intel compiler

__KMP_REGISTERED_LIB_23907=0xacfa1d0-cafe8af0-libiomp5.a

 

what does this guy do and how to explain such difference? Thank you

Hongwei

 

pragma omp task priority(n) - when?

$
0
0

I would like to specify task priority in OpenMP. This is not available in V17.0.1 (nor mentioned in release notes for V17.0.4).

My intended use if for an MPI with OpenMP application where on rank 0, a spawned task (or master thread prior to first task) runs at elevated task priority. This task manages a work queue for task issued to rank 0, as well as issuing task requests to other ranks via MPI messaging.

What I want to accomplish is to have the task manager task .NOT. participate in tasks that it enqueues. Should it do so (which I expect it is doing so now) then it will introduce an undesired latency in servicing the tasks to be issued to the additional MPI ranks (as well as to itself).

Ideally it would be nice to have

   #pragma omp task deferred

Where the task is enqueued, but not run by the enqueuing task, except when enqueuing task issues taskwait.

This feature would not require implementation of task priority.

BTW my code restricts number of pending tasks thus would not have too many pending deferred tasks.

Jim Dempsey

Zone: 

Thread Topic: 

Question

[OMP, C++] Threadprivate with Intel Compiler decreases performance

$
0
0

Hi, 
I have parallelized  my program with OpenMP. I have many of these "#pragma omp threadprivate"directives in my code:

static float cost_penalty;
static float bc_penalty;
static float explicit_cost;
static float time_penal_piece_link_below;
static float time_penal_piece_link_above;
static float time_penal_piece_below;
static float time_penal_piece_above;
static float time_penal_paid_below;
static float time_penal_paid_above;
static float time_penal_work_below;
static float time_penal_work_above;
static float br_cost_penalty;
static float br_bc_penalty;
static int   is_tripper_pre = 0;
static int   is_tripper_h_res = 0;
static int   is_tripper = 0;
static int   have_break_rule = 0;
static int   have_break_intervals = SCHED_FALSE;

#ifdef _OPENMP
#pragma omp threadprivate ( cost_penalty,       \
                            bc_penalty,         \
                            br_cost_penalty,    \
                            br_bc_penalty,    \
                            is_tripper_pre,     \
                            is_tripper_h_res,   \
                            is_tripper,         \
                            have_break_rule,    \
														explicit_cost,                    \
                            time_penal_piece_link_below,    \
                            time_penal_piece_link_above, \
                            time_penal_piece_below,       \
                            time_penal_piece_above,        \
                            time_penal_paid_below,      \
                            time_penal_paid_above,        \
                            time_penal_work_below,  \
                            time_penal_work_above )
#endif

With the msvc 2017 Professional Compiler I get the following running times:

  • Without OpenMP:                   15 seconds
  • With OpenMP and one thread: 15 seconds

Now I have build my program with the free Trial version of Intel Parallel Studio XE 2017. When I switch to the Intel Compiler I have these running times:

  • Without OpenMP:                                                  14 seconds
  • With OpenMP and one thread:                                25 seconds
  • With OpenMP and one thread without threadprivate: 15 seconds

Why is the Intel Compiler so slow in comparison to the msvc 2017 Compiler? Do I have to set any compiler flag to fix this problem?

Thank you for your Help,

Christof

 

Zone: 

Thread Topic: 

Question

C Language Reference, and identifier lengths

$
0
0

Hey folks, new to the forum, if not the language or compiler ;). I have a couple related questions.

1. Is there any C language reference (other than referring to ANSI standards and gnu and Microsoft compatibility)?

2. Are the limits on significant initial characters in an internal identifier or a macro name, or in an external name, the same as ANSI? Or are they relaxed? Microsoft (Visual Studio) has long supported 247 for internal and internal names. gcc supports all characters as signficant.

Thanks!
Kevin

Thread Topic: 

Question

Intel Compiler 17.0u3 not calling move constructor

$
0
0

On Windows, I have some fairly straightforward move constructors defined for some matrix classes.

One would expect the move constructor to be called when the object is returned from the example function below -  and it is when compiled with Visual C++ 2015 update 3. When compiled with Intel 17u3, the behaviour is bizarre. Neither the move constructor OR the copy constructor are called.Everything is compiled in DEBUG mode,IPO is off. I can understand that perhaps the compiler can be clever when all optimizations are on, but this is with all optimizations off.

 

Visual C++ 2015 output

constructor 1:FloatSymMat
move constructor:FloatSymMat
destructor:~FloatSymMat
about to call std::move
move constructor:FloatSymMat
destructor:~FloatSymMat
destructor:~FloatSymMat

Intel C++ output

constructor 1:FloatSymMat
about to call std::move
move constructor:FloatSymMat
destructor:~FloatSymMat
destructor:~FloatSymMat

If I EXPLICITLY call

FloatSymMat tmp_mat = std::move(symmat);

Then the move constructor is called as expected.

// testmoveconstructor.cpp : Defines the entry point for the console application.
//

#include <vector>
#include <iostream>
using namespace std;

inline unsigned SymMatSize(unsigned n) { return n*(n + 1) / 2; }

class  FloatSymMat {
private:

	std::vector<float> vec;
	unsigned n;
	// The data which define the matrix


public:
	FloatSymMat();
	FloatSymMat(const FloatSymMat&);
	FloatSymMat(FloatSymMat&&)noexcept;
	FloatSymMat(unsigned n, unsigned nAgain);
	FloatSymMat(unsigned n, unsigned nAgain, float initval);
	~FloatSymMat();
};



FloatSymMat::FloatSymMat(unsigned M, unsigned N)
	: vec(SymMatSize(N), 0)
{
	cout << "constructor 1:"<< __func__ << endl;
	n = N;
}

FloatSymMat::FloatSymMat(unsigned M, unsigned N, float initval)
	: vec(SymMatSize(N), initval)
{
	cout << "constructor 2:"<< __func__ << endl;
	n = N;
}

inline FloatSymMat::~FloatSymMat()
{
	cout << "destructor:"<< __func__ << endl;
}

inline FloatSymMat::FloatSymMat()
	: vec()
{
	cout << "default constructor:"<< __func__ << endl;
	n = 0;
}

inline FloatSymMat::FloatSymMat(const FloatSymMat& A)
	: vec(A.vec)
{
	cout << "copy constructor:"<< __func__ << endl;
	n = A.n;
}



inline FloatSymMat::FloatSymMat(FloatSymMat&& A)noexcept
	: vec(std::move(A.vec))
{
	cout << "move constructor:"<< __func__ << endl;
	n = A.n;
	A.n = 0;
}


FloatSymMat testReturn()
{
	FloatSymMat symmat(3, 3);
	return symmat;
}


int main()
{
	FloatSymMat tmp = testReturn();
	cout << "about to call std::move"<< endl;
	FloatSymMat tmp_mat = std::move(tmp);
	return 0;
}

 

 

ICC faster on Windows than on Linux

$
0
0

I have a dual-boot on my computer since I prefer to code on linux. On windows, I use Intel Composer XE 2013, and to compare I installed the same version on linux (at first I used the intel compiler 17 on linux, but I went back to 2013 in order to compare properly with windows).

I have a program involving mostly computation (a fixed point algorithm). I compile it on windows with similar flags (-openmp -fast and link to boost library)... but the windows executable ends up being about 1.6 times faster.

More detail about the program: at the core of it I m using the boost library, with this kind of computations (that i m calling millions of time):

boost::math::lognormal distrib(mu, sigma);
value = exp(boost::math::lgamma(N))*pow(cdf(distrib, x), 5);

I already noticed on linux (and I don t know why) that icpc tends to be faster for any call to boost function (like lgamma) than g++. However it's even faster overall on windows than on linux and I don't know why (and this is highly probably the reasons why windows is faster than linux).
(I also use openmp to parallelize loops but even without it the difference between OS stays).

What is the explanation? maybe some pre-build differences in compilation flags between the two platforms? Or is windows simply faster with ICC ? This would be weird because I never heard of such a thing (and for gcc for example, it tends to always be faster on linux with the same computer...).

Not sure it s relevant but on windows when I compile with the following .bat:

@call "C:\Program Files (x86)\Intel\Composer XE 2013\bin\iclvars.bat" ia32
icl /openmp /I "C:\boost_1_61_0" /fast program.cpp

and when I launch it, it says:

Intel(R) composer XE 2013 (package 089)
Setting environment for using Microsoft Visual Studio 2010 x86 tools.
Intel(R) C++ Compiler XE for applications running on IA-32. Version 13.0.0.089 Build 20120731
...
Microsoft (R) Incremental Linker Version 10.00.30319.01
-

Notice that I use ia32 while I'm being on an intel64 system (x86_64). But as a consequence on Linux I'm also using ia32 with:

source /media/usr/intel2013/bin/iccvars.sh intel64
icpc -o program program.cpp -std=c++11 -openmp -fast

Notice that on linux I have to impose -std=c++11 for it not to bug (while on windows I don't think it uses c++11 in fact when compiling.

I would like to definitely switch to linux but I would need my program to run at least as fast as on windows. It's been about 5 days that I'm looking for a solution to this, tweaking with flags and all, but I'm pretty new to compiler performance optimization and I was unable to find the reason why I have differences.
Any help would be greatly appreciated.

Edit: as for my hardware: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz, 8 CPU, 4 cores, 2 threads per core, architecture x86_64. Windows is 8.1 and Linux 4.8.0-54-generic #57~16.04.1-Ubuntu

Thread Topic: 

Help Me

C Language Reference, and identifier lengths (follow-up)

$
0
0

 

Followup to: https://software.intel.com/en-us/forums/intel-c-compiler/topic/736065

This question was answered by providing a link to the compiler reference, thanks.

But actually, I am looking for a language reference, not a compiler reference, and I originally posted because I haven't been able to find a language reference in the compiler reference. Nor have I been able to find the identifier length limits.

Thanks,
Kevin

Visual Studio v120 vs v140 performance, Intel C++ 17

$
0
0

I switched over from Visual Studio 2013 with Intel C++ 15 to Visual Studio 2015 with Intel C++ 17 (rev 4). Since the old compiler is still installed on my system, I can select to build with C++ 17 with "Base Platform Toolset" v120 in Visual Studio 2015.

What I'm seeing is that Visual Studio 2015 + C++ 17 + v140_xp gives me nearly exactly the same performance as Visual Studio 2013 + C++ 15 + v120_xp. Oddly however, if I replace v140_xp by v120_xp, the performance improves by 2-3%. I've repeated this test multiple times and each time I'm finding improvement numbers between 1.8 and 3.7%.

I can imagine that certain Windows calls could have gotten more or less expensive, but the code that's affected by this is calculation-heavy code. Now, I could just choose to compile with v120_xp, but that has several drawbacks. I upgraded to the new Visual Studio because of a bug fix in the std::chrono library, which of course isn't available in v120. And some of my projects have link errors when using v120_xp (which I can fix by linking to some v140 libraries, but that feels extremely scary).

2-3% isn't much, but it's a shame to loose performance for no good reason. So I was hoping that someone here could give some insight in how the Base Platform Toolset can affect the performance of calculations.

(Note: This is a project of 7 MB, so I really can't post an example file).

Thread Topic: 

Question

segmentation fault with icpc

$
0
0

Hi,

 

Segmentation fault sometimes occurs in a program when icpc is used.

The version of icpc I am using is 17.0.2 20170213, and the version of g++ is 5.2.0.

It does not happen when g++ is used.

This issue was originally found in BOUT++ code. (https://github.com/boutproject/BOUT-dev)

Please have a look at the attached reproducer.

It was written based on BOUT++ code.

If g++ is used:

$ ./a.out
0
0
1
1
1
2
2
2

 

But if icpc is used:

$ ./a.out.intel
0
0
...
Segmentation fault (core dumped)

 

Please note that sometimes the segfault does not occur, but always the output is corrupted.(0 0 0....)

 

The problem does not happen even if icpc is used when

1) a copy constructor is defined (a.out.intel.copy)

2) reference variables are not used (a.out.intel.noref)

 

Though this issue had been fixed in BOUT++ code by doing 2) (https://github.com/boutproject/BOUT-dev/pull/468), but I still suspect this is a bug of intel compiler because the problem does not happen with g++.

Is this a bug of icpc?

 

Best regards,

Daichi

AttachmentSize
Downloadapplication/x-gtarreproducer.tgz121.4 KB

Thread Topic: 

Question

ICInstallDir

$
0
0

In my installation of PSXE 2017 (basic, i.e. no update) (win7, 64bit on VS2010, 2013, 2015) one sees 2 directories, namely:

a)C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017\windows

and

b) C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.109\windows

a) is effectively empty: it contains a redist folder, which in turn contains 2 emty folders (intel64_win and intel32_win(.

b) Contains a bin folder, containing a bunch of batch files and a number of folders which are architecture specific and each one contains the icl and xilink applications. Also, in b) there are folders for compiler, daal, ipp, mkl, redist, tbb.

When in VS2015 I select the Intel C++ 17.0 toolset, I see: ICInstallDir pointing to a). As a result, I cannot use it !

Can you please help me resolving this?

TIA,

Petros

PS1: Interestingly enough when I start the "Intel Compiler 17.0 Intel(R) 64 Visual STudio 2015" command prompt, I see:

C:\Program Files (x86)\IntelSWTools>echo %ICInstallDir%
C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.0.109\windows

(i.e. the correct one !)

PS: One possible(?) remedy would be to somehow overwrite the ICInstallDir  project macro. However, by looking at some intel props files, it is not to be found where (naively, at least) these files look for in the registry - provided I am looking at the right places ;-)

 

 

Choosing between multiple versions of MSVC

$
0
0

I have the Intel C++ compiler installed onto a Windows machine, which has been using MSVC 10 (2010). I recently installed MSVC 12 (2013) alongside the MSVC 10 compiler, and now icpc seems to be defaulting to use that version of MSVC. Is there a way to configure the Intel compiler to use the MSVC 10 version instead?

Thread Topic: 

Question

offload to Iris Pro Graphics P580

$
0
0

Sorry if this is slightly off-topic but I didn't see a more applicable forum. I am trying to use Cilk offload for GPGPU applications and I can't find a driver that works with the E3-1585v5 / Iris Pro P580 in Windows 10. I have tried several drivers, both using the installer and manual install. It appears that the offload driver is not getting installed on the system:

[snip]
C:\>gfx_sys_check -v

Checking CPU
family:6 model:5e type:0 stepping:3 (signature:506e3)

Checking OS
Windows 8 x64
graphics timeout set to default (2 seconds)

Checking display
device:
Provider: Intel Corporation
Description: Intel(R) Iris(TM) Pro Graphics P580
Version: 21.20.16.4678
device:
Provider: ASPEED
Description: ASPEED Graphics Family(WDDM)
Version: 9.0.10.102

Checking Intel HD Graphics Driver
failed to load library igfx11cmrt64.dll (error#126)
[end snip]

Running "dir /s igfx11cmrt64.dll" from C:\ shows that the offload driver files are not in the C:\Windows\Sys* folders. If I navigate to the folder I extracted the driver into, gfx_sys_check does recognize the device:

[snip2]
Checking Intel HD Graphics Driver
DX11 version will be used
cm_version=600
RT Dll version: (6.0.0.1189)
JIT Dll version: (6.0.0.1189)
CAP_KERNEL_COUNT_PER_TASK=16
CAP_KERNEL_BINARY_SIZE=65536
CAP_SAMPLER_COUNT=64
CAP_SAMPLER_COUNT_PER_KERNEL=16
CAP_BUFFER_COUNT=256
CAP_SURFACE2D_COUNT=256
CAP_SURFACE3D_COUNT=64
CAP_SURFACE_COUNT_PER_KERNEL=255
CAP_ARG_COUNT_PER_KERNEL=255
CAP_ARG_SIZE_PER_KERNEL=2016
CAP_USER_DEFINED_THREAD_COUNT_PER_TASK=262144
CAP_HW_THREAD_COUNT=504
CAP_SURFACE2D_FORMAT_COUNT=26
CAP_SURFACE3D_FORMAT_COUNT=2
CAP_VME_STATE_G6_COUNT=8
CAP_GPU_PLATFORM=Skylake
CAP_GT_PLATFORM=GT4
CAP_MIN_FREQUENCY=350
CAP_MAX_FREQUENCY=1150
CAP_GPU_CURRENT_FREQUENCY=0
JIT version: 3.3
executing visa3.0 CmRT reported execution time 25999 nanosec
executing visa3.1 CmRT reported execution time 9249 nanosec
executing visa3.2 CmRT reported execution time 9083 nanosec
GPU architecture: skylake
vISA support: visa3.2
[end snip2]

Visa3.x CmRT seems to work but it isn't clear to me if that is offload or something else like DX11, OpenCL, etc.

I tried running some of the samples and my code--I thought they may run if I copy the exe into that folder; however, the code ran but from the execution time it was obviously running on a single CPU core and not offloaded. The code and samples both offload just fine on my other machine (i7-4810MQ w/HDG4600), so the driver issue notwithstanding, I can't rule out a compiler issue. I am using MSVS 2015 update 3 and Intel 2017 update 2 on both machines.

Thanks,
Randy

Zone: 

Thread Topic: 

Help Me

Generating and modifing constexpr tuples

$
0
0

Intel compiler: icpc (ICC) 17.0.2 20170213
g++: g++ (GCC) 5.4.0
clang++: clang version 3.5.2

The following code compiles with g++ and clang++ but not with the intel compiler (using -std=c++14):

using tuple_type = decltype(std::make_tuple(3));

constexpr tuple_type const_tuple0(std::make_tuple(3));
constexpr tuple_type const_tuple1 {std::make_tuple(3)};
constexpr tuple_type const_tuple2 = std::make_tuple(3);

With c++14 we have the possibility to generate and modify compile-time tuples. Unfortunately the intel compiler doesn't seem to like this and complains with

error: expression must have a constant value

for all three statements. Further investigation showed that any constexpr function returning a tuple won't compile. Returning user-defined constexpr classes from constexpr functions works. So there seems to be a problem in returning constexpr tuples.

Thread Topic: 

Bug Report

How to renew the Parallel Studio License for Linux (Open source developers)

$
0
0

I have installed Parallel Studio for Linux about a year a go using an open source developer license. The license has now expired but I can't seem to find a way to renew it. When I check my profile on the Intel website it simply states that the license is now expired.

How should I proceed?

Thanks!

Visual Studio 2017 integration

$
0
0

Hi,

On my machine, I have installed VS Community 2015, then Parallel Studio XE 2017 update 4, and then VS Community 2017.

I have the integration of Parallel Studio in VS 2015, but not in VS 2017. I have tried to run the installer again and repair, but I had no luck. I have tried to install new components, but I can't see anything for VS 2017.

How can I install support for VS 2017?

Best regards

Intel 17.0 Update 4 and VS2017 /FR (Browse Information) Fails

$
0
0

The /FR (browse information) compiler option does not work correctly with Intel 17.0 Update and VS2017. BSCMAKE generates an error about being unable to access the .sbr file.

This works correctly under VS2013 and VS2015.

Viewing all 2797 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>