Visual sensors, including 3D light detection and ranging, neuromorphic dynamic vision sensor, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. However, their data are heterogeneous, causing complexity in system development. Moreover, conventional digital hardware is constrained by von Neumann bottleneck and the physical limit of transistor scaling. The computational demands of training ever-growing models further exacerbate these challenges. We propose a hardware-software co-designed random resistive memory-based deep extreme point learning machine. Data-wise, the multi-sensory data are unified as point set and processed universally. Software-wise, most weights are exempted from training. Hardware-wise, nanoscale resistive memory enables collocation of memory and processing, and leverages the inherent programming stochasticity for generating random weights. The co-design system is validated on 3D segmentation (ShapeNet), event recognition (DVS128 Gesture), and image classification (Fashion-MNIST) tasks, achieving accuracy comparable to conventional systems while delivering 6.78 × /21.04 × /15.79 × energy efficiency improvements and 70.12%/89.46%/85.61% training cost reductions. Processing heterogeneous visual data in edge-side intelligent machines is complex and inefficient. Here, the authors propose a hardware-software co-designed system using random resistive memory, achieving significant energy efficiency and training cost reductions.