We present an algorithm for a permutation exchange operation on a coarse grained parallel computer. Our algorithm is more efficient that a previously published solution to this problem, and enables us to derive an efficient algorithm for matrix multiplication.