A time-optimal procedure to transpose in situ a matrix stored over a distributed memory 2-dimensional mesh-connected parallel computer is shown. The matrix need not be square. Only nearest-neighbor blocking communications is used, and a small bounded buffer space is required.