The Influence of Architecture on Parallel Programming

To repeat the context, I quoted Nir Shavit in his CACM article entitled “Data Structures in the Multicore Age” as follows:

Unlike on uniprocessors, the choice of algorithm will continue, for years to come, to be greatly influenced by the underlying machine's architecture.

Is Dr. Shavit's statement true? Why or why not?

Dr. Shavit's assertion that multiprocessor algorithm choice will be greatly influenced by the underlying machine's architecture does have some noteworthy counter-examples, as noted in an earlier post.

However, these counter-examples are high-level programming environments. In contrast, Dr. Shavit's article is very clearly focused on very low-level data structures such as stacks. As noted earlier, this low-level focus can be problematic: partitioning a program at a higher level usually produces a much simpler and faster result.

But the fact is that some uniprocessor programs really are tightly bound to the underlying machine's architecture. A frequent reason for this tight binding is a need for high performance. But if you don't need high performance, or at least higher performance than a single CPU can give you, there is no need for you to produce a parallel implementation.

Therefore, because the software taking advantage of multicore hardware is more likely to need high performance, such software is more likely to be tightly bound to the underlying machine's architecture than is the typically less performance-sensitive single-threaded software.

However, as parallel software becomes more pervasive and as multicore hardware continues to decrease in price, we can expect to see more parallel software that, like SQL and “make -j”, need not be tightly bound to the underlying machine's architecture. This is especially true of embarrassingly parallel software. In fact, the greater the computation-to-communication ratio (where locking is a form of communication), the less the programmer need concern himself or herself with the details of the underlying machine.

And this is yet another good reason to partition the problem at as high a level as you reasonably can. The resulting program will not only be simpler and faster, it will also be less dependent on the underlying hardware: In other words, it will also be more portable.