Mobile based sign language recognition (SLR) is challenging in real time due to camera shudder and the signer movements for capturing continuous video data for recognition. Even though there are many state-of-the-art methods for SLR, they have ignored view sensitivity and its effects on the accuracy of the system. This work proposes a novel multi view deep metric feature learning (MVslDML) model for building a view sensitive environment into SLR, which is being investigated profoundly in human action recognition. The MVslDMLNet is an end-to-end trainable convolutional neural network where the features extracted from multiple views are learned based on the sharable and unshareable latent features within class multi view data through metric learning. Experiments performed on our multi view sign language and four benchmark action video datasets indicate a higher accuracy for the proposed framework.