Human Activity Recognition

Yang Xing

Although significant advances have been made in Human Activity Recognition (HAR) recently, the modelling of multiple time scale human activities remains an open issue. Multi-scale human behaviour is ubiquitous in daily life. For example, humans commonly need to perform a series of sub-activities in order to achieve a final objective. The combination of these sub-activities leads to specific long-term activities. The recognition of long-term activities and human objectives plays an equally important role in terms of human behavioural reasoning. Multi-scale human activity recognition and anticipation (MS-HARA) can then participate in more advanced human-machine collaborations for better human-machine systems. Considering the aforementioned challenges and advantages, this study focuses on three tasks: 1) construction of the multi-scale human activity recognition and anticipation framework; 2) development of an efficient fusion framework for the two-stream network to support precise multi-scale reasoning; and 3) efficient learning of a multi-scale temporal representation and inference from untrimmed data for both recognition and anticipation tasks. The MS-HARA network is constructed with an efficient fusion mechanism using CNN and RNN models. The contributions of this study can be summarized as follows. First, an MS-HARA network is designed following an end-to-end process. The network jointly models activity recognition and anticipation, and contributes to the analysis of the relationship between these two tasks. Second, a fusion-based two-stream network is designed based on an efficient TCA method and various late fusion operators. The TCA module also bridges the 3D and 2D ConvNets. Last, empirical experiments are proposed to evaluate the mid-term activity recognition, MS-HARA and fusion, and to explore recognition and prediction. Experimental results on various datasets in multiple domains (Brain4Cars, FineGym, GTEA) show the MS-HARA can effectively modeling the complex temporal patterns of multi-scale human behaviors and contribute to accurate activity recognition and anticipation.

 

mshara overall architecture
https://www.youtube.com/embed/KAfZ-8BRt68