Robot Mapping of Human-Populated Scenes

Stuart Golodetz

Unmanned aerial vehicles (UAVs) have been used for numerous applications in recent years, ranging from urban search and rescue, to agricultural surveying, to the autonomous exploration of underground mines. However, the safe deployment of UAVs in tight, indoor spaces, especially in close proximity to humans, remains a challenge, limiting the potential applications of UAVs in e.g. indoor workplace environments. One possible solution, at least for applications with limited payload requirements, is simply to use smaller UAVs, which pose less of a risk to humans and generally also cost less to replace in the event of a crash. However, small UAVs can generally only carry a more limited suite of sensors, e.g. only a monocular camera instead of a stereo pair or LiDAR, which can significantly increase the difficulty of performing tasks such as dense mapping and markerless multi-person 3D human pose estimation from such a vehicle, tasks that are needed to safely operate in tight environments around people. Monocular approaches to such tasks do exist, and indeed various dense monocular mapping approaches have been successfully deployed for UAV applications. However, despite numerous recent works on both marker-based and markerless multi-UAV single-person motion capture, markerless single-camera multi-person 3D human pose estimation is currently a much earlier-stage technology, and we are not aware of any existing attempts to deploy it in an aerial context. The system we have built for this project is thus, to our knowledge, the first to perform joint mapping and multi-person 3D human pose estimation from a monocular camera mounted on a single UAV.

screenshot 2021 11 24 at 11 24 19