LSMB19: A Large-Scale Motion Benchmark for Searching and Annotating in Motion Data Streams
LSMB19 Benchmark
The LSMB19 dataset is built by interconnecting 3D skeletal data of non-interactive actions of the NTU’s RGB+D Action Recognition dataset into two very long continuous sequences, each one for the cross-subject and cross-view modality.
Cross-subject sequence | Cross-view sequence | |
# of subjects | 20 | 40 |
# of actions | 32,772 | 30,757 |
# of action classes | 49 | 49 |
Total length | 3,006,996 frames (26.8 h) | 2,883,295 frames (26.7 h) |
Total actions length | 26.1 h | 25 h |
Total transitions length | 1.7 h | 1.7 h |
- Test data
Two very long continuous sequences, each one for cross-subject and cross-view modality. - Training data
Two batches of training action samples, each one for cross-subject and cross-view modality. - Ground truth
Class labels for training data and annotations (i.e., beginnings and endings of actions along with their class labels) for test data. - Search queries
98 hand-picked queries for evaluation of the search task. - Evaluation metrics
Unified effectiveness and efficinecy evaluation criteria.
The training data can be used for training purposes, i.e., metadata such as action class, actor id or camera/setup settings can be used without limitations. The test data should be only used for evaluation purposes unless unsupervised methods are used. The search queries should be used only for evaluation purposes in all cases.
Short Visualization
Possible Evaluation Scenarios
- Subsequence searching
- Semantic segmentation
- Offline sequence annotation
- Online (early) action detection
- Action prediction
- Pattern mining
Dataset in Numbers
- 2 long sequences
- 2 modalities (cross-subject and cross-view)
- 53.5 hours in total
- 63,527 annotated actions
- 49 classes
- 30 frame-per second
- 25 tracked joints
Data Quality
- Original data captured by Kinect v2
- Imprecise capturing
- Joint occlusions
- Noise
- Recognizing actions is very challenging
1. Annotations (Ground Truth) Format
Files: cs_sequence_annotations.txt (for the cross-subject modality), cv_sequence_annotations.txt (for the cross-view modality)
These files contain ground truth metadata for evaluating searching and annotation scenarios. These data must not be by any means used during the training process. Each row in these files denotes one action annotation using the following notation (columns are separated by tab):
SsssCcccPpppRrrrAaaa START END
Persons, setups and camera views follow the specification of the NTU’s RGB+D Action Recognition dataset. Action classes too, but the 11 interaction classes (i.e., action IDs from A050 to A060) are excluded.
Ssss – setup [S001, ..., S017]
Cccc – camera view [C001, ..., C003]
Pppp – person [P001, ..., P040]
Pppp – repetition [R001, ..., R003]
Aaaa – action class [A001, ..., A049]
START – frame number, where the action starts (inclusive)
END – frame number, where the action finishes (exclusive)
NOTE: The SsssCcccPpppRrrrAaaa identifier is the original NTU-RGBD dataset file identifier.
2. Training and Test Data Format ~ 3D Skeletal Data
Files: cs_sequence.txt, cv_sequence.txt, cs_training_samples.txt, cv_training_samples.txt, queries.txt
These files contain 3D skeletal data, i.e., the 3D joint coordinates of tracked joints in a frame-by-frame manner. The test data files (cs_sequence.txt and cv_sequence.txt) contain a single long sequence, while the training data ones (cs_training_samples.txt and cv_training_samples.txt) consist of many action samples, with respect to the cross-subject (cs) or cross-view (cv) modality. The query file (queries.txt) contain 98 preselected samples for the subsequence search task. Each sequence/action follows the same data format in which each file row corresponds to the 3D joint coordinates of a single frame:
joint1[X], joint1[Y], joint1[Z]; ...; joint25[X], joint25[Y], joint25[Z]
Individual joints are separated by semicolns, while their x/y/z axes by commas. Order of joints follows the same order as of Kinect v2 device.
Since query and training data files contain multiple action samples, individual actions are separated by a single row corresponding to the action identifier. This identifier has the following format:
SsssCcccPpppAaaa_Length
Joint order
1. root, 2. left hip, 3. left femur, 4. left tibia, 5. left foot, 6. right hip, 7. right femur, 8. right tibia, 9. right foot, 10. lower back, 11. lower neck, 12. upper neck, 14. left clavicle, 15. left humerus, 16. left wrist, 17. left hand, 18. left fingers, 19. left thumb, 20. right clavicle, 21. right humerus, 22. right wrist, 23. right hand, 24. right fingers, 25. right thumb
Length – number of frames of the current action/sequence
The LSMB19 dataset is derived from the NTU’s RGB+D Action Recognition dataset and is released only for academic research purposes. By downloading and using the dataset you consent with the NTU Dataset Release Agreement.
Download the LSMB19 dataset (.zip) (2.8 GB)
Individual Dataset Files for Donwload
- Long Sequences (i.e., Test Data)
- Cross-view sequence (.zip, 1 GB)
- Cross-subject sequence (.zip, 971 MB)
- Train Samples
- Cross-view train data (.zip, 453 MB)
- Cross-subject train data (.zip, 413 MB)
- Annotations (i.e., Ground Truth)
- Cross-view sequence annotations (.txt, 1 MB)
- Cross-subject sequence annotations (.txt, 1 MB)
- Search Queries
- 96 queries for both sequences (.zip, 3 MB)
When working with the LSMB19 dataset, cite the following references:
-
A Shahroudy, J Liu, T-T Ng, and G Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 [PDF] [bibtex].
-
J Sedmidubsky, P Elias, and P Zezula, "Benchmarking Search and Annotation in Continuous Human Skeleton Sequences", in International Conference on Multimedia Retrieval (ICMR), 2019 [ACM DL]