Resumen
Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes ??((log2??)2)
O
(
(
log
2
N
)
2
)
computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in ??(log2??)
O
(
log
2
N
)
on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an ??(log2??)
O
(
log
2
N
)
time complexity. We also present empirical results that indicate that our novel approach outperforms the ??((log2??)2)
O
(
(
log
2
N
)
2
)
approach.