Evaluation of Fibonacci-Number Benchmark

Use of a parallel doubly recursive Fibonacci-number benchmark is said to be a good way to compare the thread-creation and synchronization overheads of different parallel work-stealing schedulers. So what is the problem, anyway?

Disposition of generally useful batching optimizations. If you are going to incur the overhead required to steal a work item, why not steal a sufficiently large batch to make the theft worthwhile? But this is a two-edged sword. On the one hand, if such batching is permitted, the benchmark cannot be said to be fairly evaluating task-creation and synchronization overhead. On the other hand, if such batching is prohibited, the benchmark is suppressing a critically important optimization.
Handling of synchronization-elision optimizations. It does not take too much imagination to conceive of a scheduler that optimizes away task-creation operations, substituting much-cheaper function-call operations. This is also a two-edged sword: aggressive application of this optimization yields hundred-million-fold speedups, but totally eliminates the synchronization operations that are ostensibly being measured.
Obscuring generally useful higher-level optimizations. If there really was some real-world reason to parallelize Fibonacci-number generation, such parallelization should start with an efficient algorithm, whose computational complexity is O(log(n)). For large n, the focus should then be parallelizing the large-integer multiplies — but this generally useful optimization is completely obscured by this benchmark.
Any developers who foolishly attempt to learn parallel programming from the code implementing this benchmark would be sadly misled. The road to high performance and excellent scalability is through increasing the granularity of parallelism, not by minimizing it.

Let's evaluate this benchmark in each of the roles benchmarks play:

Providing a fair framework for comparing competing implementations. Given the potential for “gaming” this benchmark via any of the optimizations called out above, the Fibonacci-number benchmark cannot be said to play this role.
Focusing competitive energy on improving implementations in ways that matter to users. Given the number of useful optimizations that this benchmark suppresses, it cannot be said to play this role particularly well.
Serving as example uses of the implementations being benchmarked. This benchmark can at best be held out as an example use to avoid like the plague.

In short, my advice to you is to ignore any comparisons based on the doubly recursive Fibonacci-number benchmark, and instead run your own benchmark using code sequences that are important to your application. Or, better yet, implement and popularize a set of good benchmarks for these schedulers.