The basic idea is that you convert your main C++ for loop from something like this:
// Normal for loop uses only 1 thread....to something like this:
for (int i = 0; i < 1000000; ++i)
{
processMyData(i);
}
// Parallel for loop automatically uses 1 thread per processor.
QMP_PARALLEL_FOR(i, 0, 1000000)
processMyData(i);
QMP_END_PARALLEL_FOR