LSMB19: A Large-Scale Motion Benchmark for Searching and Annotating in Motion Data Streams

Read the Full Paper

Dataset Description

LSMB19 Benchmark

The LSMB19 dataset is built by interconnecting 3D skeletal data of non-interactive actions of the NTU’s RGB+D Action Recognition dataset into two very long continuous sequences, each one for the cross-subject and cross-view modality.

  Cross-subject sequence Cross-view sequence
# of subjects 20 40
# of actions 32,772 30,757
# of action classes 49 49
Total length 3,006,996 frames (26.8 h) 2,883,295 frames (26.7 h)
Total actions length 26.1 h 25 h
Total transitions length 1.7 h 1.7 h
  1. Test data
    Two very long continuous sequences, each one for cross-subject and cross-view modality.
  2. Training data
    Two batches of training action samples, each one for cross-subject and cross-view modality.
  3. Ground truth
    Class labels for training data and annotations (i.e., beginnings and endings of actions along with their class labels) for test data.
  4. Search queries
    98 hand-picked queries for evaluation of the search task.
  5. Evaluation metrics
    Unified effectiveness and efficinecy evaluation criteria.

The training data can be used for training purposes, i.e., metadata such as action class, actor id or camera/setup settings can be used without limitations. The test data should be only used for evaluation purposes unless unsupervised methods are used. The search queries should be used only for evaluation purposes in all cases.

Short Visualization

Read LSMB Paper

Download the Dataset

Possible Evaluation Scenarios

  • Subsequence searching
  • Semantic segmentation
    • Offline sequence annotation
    • Online (early) action detection
  • Action prediction
  • Pattern mining

Dataset in Numbers

  • 2 long sequences
  • 2 modalities (cross-subject and cross-view)
  • 53.5 hours in total
  • 63,527 annotated actions
  • 49 classes
  • 30 frame-per second
  • 25 tracked joints

Data Quality

  • Original data captured by Kinect v2
  • Imprecise capturing
    • Joint occlusions
    • Noise
  • Recognizing actions is very challenging
File Format

1. Annotations (Ground Truth) Format

Files: cs_sequence_annotations.txt (for the cross-subject modality), cv_sequence_annotations.txt (for the cross-view modality)

These files contain ground truth metadata for evaluating searching and annotation scenarios. These data must not be by any means used during the training process. Each row in these files denotes one action annotation using the following notation (columns are separated by tab):

SsssCcccPpppRrrrAaaa        START        END

Persons, setups and camera views follow the specification of the NTU’s RGB+D Action Recognition dataset. Action classes too, but the 11 interaction classes (i.e., action IDs from A050 to A060) are excluded.

Ssss setup [S001, ..., S017]
Cccc camera view [C001, ..., C003]
Pppp – person [P001, ..., P040]
Pppp – repetition [R001, ..., R003]
Aaaa – action class [A001, ..., A049]
START – frame number, where the action starts (inclusive)
END – frame number, where the action finishes (exclusive)

NOTE: The SsssCcccPpppRrrrAaaa identifier is the original NTU-RGBD dataset file identifier.

2. Training and Test Data Format ~ 3D Skeletal Data

Files: cs_sequence.txt, cv_sequence.txt, cs_training_samples.txt, cv_training_samples.txt, queries.txt

These files contain 3D skeletal data, i.e., the 3D joint coordinates of tracked joints in a frame-by-frame manner. The test data files (cs_sequence.txt and cv_sequence.txt) contain a single long sequence, while the training data ones (cs_training_samples.txt and cv_training_samples.txt) consist of many action samples, with respect to the cross-subject (cs) or cross-view (cv) modality. The query file (queries.txt) contain 98 preselected samples for the subsequence search task. Each sequence/action follows the same data format in which each file row corresponds to the 3D joint coordinates of a single frame:

joint1[X], joint1[Y], joint1[Z]; ...; joint25[X], joint25[Y], joint25[Z]

Individual joints are separated by semicolns, while their x/y/z axes by commas. Order of joints follows the same order as of Kinect v2 device.

Since query and training data files contain multiple action samples, individual actions are separated by a single row corresponding to the action identifier. This identifier has the following format:


Joint order
1. root, 2. left hip, 3. left femur, 4. left tibia, 5. left foot, 6. right hip, 7. right femur, 8. right tibia, 9. right foot, 10. lower back, 11. lower neck, 12. upper neck, 14. left clavicle, 15. left humerus, 16. left wrist, 17. left hand, 18. left fingers, 19. left thumb, 20. right clavicle, 21. right humerus, 22. right wrist, 23. right hand, 24. right fingers, 25. right thumb

Length – number of frames of the current action/sequence


The LSMB19 dataset is derived from the NTU’s RGB+D Action Recognition dataset and is released only for academic research purposes. By downloading and using the dataset you consent with the NTU Dataset Release Agreement.

Download the LSMB19 dataset (.zip) (2.8 GB)

Individual Dataset Files for Donwload

When working with the LSMB19 dataset, cite the following references:

  • A Shahroudy, J Liu, T-T Ng, and G Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 [PDF] [bibtex].

  • J Sedmidubsky, P Elias, and P Zezula, "Benchmarking Search and Annotation in Continuous Human Skeleton Sequences", in International Conference on Multimedia Retrieval (ICMR), 2019 [ACM DL]

You are running an old browser version. We recommend updating your browser to its latest version.

More info