LSMB19: A Large-Scale Motion Benchmark for Searching and Annotating in Motion Data Streams

LSMB19 Benchmark

The LSMB19 dataset is built by interconnecting 3D skeletal data of non-interactive actions of the NTU’s RGB+D Action Recognition dataset into two very long continuous sequences, each one for the cross-subject and cross-view modality.

	Cross-subject sequence	Cross-view sequence
# of subjects	20	40
# of actions	32,772	30,757
# of action classes	49	49
Total length	3,006,996 frames (26.8 h)	2,883,295 frames (26.7 h)
Total actions length	26.1 h	25 h
Total transitions length	1.7 h	1.7 h

Test data
Two very long continuous sequences, each one for cross-subject and cross-view modality.
Training data
Two batches of training action samples, each one for cross-subject and cross-view modality.
Ground truth
Class labels for training data and annotations (i.e., beginnings and endings of actions along with their class labels) for test data.
Search queries
98 hand-picked queries for evaluation of the search task.
Evaluation metrics
Unified effectiveness and efficinecy evaluation criteria.

The training data can be used for training purposes, i.e., metadata such as action class, actor id or camera/setup settings can be used without limitations. The test data should be only used for evaluation purposes unless unsupervised methods are used. The search queries should be used only for evaluation purposes in all cases.

Short Visualization

Read LSMB Paper

Download the Dataset

Possible Evaluation Scenarios

Subsequence searching
Semantic segmentation
- Offline sequence annotation
- Online (early) action detection
Action prediction
Pattern mining

Dataset in Numbers

2 long sequences
2 modalities (cross-subject and cross-view)
53.5 hours in total
63,527 annotated actions
49 classes
30 frame-per second
25 tracked joints

Data Quality

Original data captured by Kinect v2
Imprecise capturing
- Joint occlusions
- Noise
Recognizing actions is very challenging

File Format

1. Annotations (Ground Truth) Format

Files: cs_sequence_annotations.txt (for the cross-subject modality), cv_sequence_annotations.txt (for the cross-view modality)

These files contain ground truth metadata for evaluating searching and annotation scenarios. These data must not be by any means used during the training process. Each row in these files denotes one action annotation using the following notation (columns are separated by tab):

SsssCcccPpppRrrrAaaa        START        END

Persons, setups and camera views follow the specification of the NTU’s RGB+D Action Recognition dataset. Action classes too, but the 11 interaction classes (i.e., action IDs from A050 to A060) are excluded.

Ssss – setup [S001, ..., S017]
Cccc – camera view [C001, ..., C003]
Pppp – person [P001, ..., P040]
Pppp – repetition [R001, ..., R003]
Aaaa – action class [A001, ..., A049]
START – frame number, where the action starts (inclusive)
END – frame number, where the action finishes (exclusive)

NOTE: The SsssCcccPpppRrrrAaaa identifier is the original NTU-RGBD dataset file identifier.

2. Training and Test Data Format ~ 3D Skeletal Data

Files: cs_sequence.txt, cv_sequence.txt, cs_training_samples.txt, cv_training_samples.txt, queries.txt

These files contain 3D skeletal data, i.e., the 3D joint coordinates of tracked joints in a frame-by-frame manner. The test data files (cs_sequence.txt and cv_sequence.txt) contain a single long sequence, while the training data ones (cs_training_samples.txt and cv_training_samples.txt) consist of many action samples, with respect to the cross-subject (cs) or cross-view (cv) modality. The query file (queries.txt) contain 98 preselected samples for the subsequence search task. Each sequence/action follows the same data format in which each file row corresponds to the 3D joint coordinates of a single frame:

joint1[X], joint1[Y], joint1[Z]; ...; joint25[X], joint25[Y], joint25[Z]

Individual joints are separated by semicolns, while their x/y/z axes by commas. Order of joints follows the same order as of Kinect v2 device.

Since query and training data files contain multiple action samples, individual actions are separated by a single row corresponding to the action identifier. This identifier has the following format:

SsssCcccPpppAaaa_Length

Joint order
1. root, 2. left hip, 3. left femur, 4. left tibia, 5. left foot, 6. right hip, 7. right femur, 8. right tibia, 9. right foot, 10. lower back, 11. lower neck, 12. upper neck, 14. left clavicle, 15. left humerus, 16. left wrist, 17. left hand, 18. left fingers, 19. left thumb, 20. right clavicle, 21. right humerus, 22. right wrist, 23. right hand, 24. right fingers, 25. right thumb

Length – number of frames of the current action/sequence

Download

The LSMB19 dataset is derived from the NTU’s RGB+D Action Recognition dataset and is released only for academic research purposes. By downloading and using the dataset you consent with the NTU Dataset Release Agreement.

Download the LSMB19 dataset (.zip) (2.8 GB)

Individual Dataset Files for Donwload

Long Sequences (i.e., Test Data)
- Cross-view sequence (.zip, 1 GB)
- Cross-subject sequence (.zip, 971 MB)
Train Samples
- Cross-view train data (.zip, 453 MB)
- Cross-subject train data (.zip, 413 MB)
Annotations (i.e., Ground Truth)
- Cross-view sequence annotations (.txt, 1 MB)
- Cross-subject sequence annotations (.txt, 1 MB)
Search Queries
- 96 queries for both sequences (.zip, 3 MB)

When working with the LSMB19 dataset, cite the following references:

A Shahroudy, J Liu, T-T Ng, and G Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 [PDF] [bibtex].
J Sedmidubsky, P Elias, and P Zezula, "Benchmarking Search and Annotation in Continuous Human Skeleton Sequences", in International Conference on Multimedia Retrieval (ICMR), 2019 [ACM DL]