Resumen
The complex memory hierarchies of nowadays machines make it very difficult to estimate the execution time of the tasks as depending on where the data is placed in memory, tasks of the same type may end up having different performance. Multiple scheduling heuristics have managed to improve performance by taking into account memory-related properties such as data locality and cache sharing. However, we may see tasks in certain applications or phases of applications that take little or no advantage of these optimizations. Without understanding when such optimizations are effective, we may trigger unnecessary overhead at runtime level.In previous work, we introduced TaskInsight, a technique to characterize how the memory behavior of the application is affected by different task schedulers through the analysis of data reuse across tasks. We now use this tool to dynamically trace the scheduling decisions of multithreaded applications through their execution and analyze how memory reuse can provide information on when and why locality-aware optimizations are effective and impact performance.We demonstrate how we can detect particular scheduling decisions that produced a variation in performance, and the underlying reasons when applying TaskInsight to several of the Montblanc benchmarks. This flexible insight is key both for the programmer and runtime to allow assigning the optimal scheduling policy to certain executions or phases.