Now showing 1 - 7 of 7
  • Placeholder Image
    Publication
    Leveraging information from imperfect examples: Common action sequence mining from a mix of incorrect performances
    (2018-12-18)
    Jain, Hiteshi
    ;
    As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions with the help of some imperfect performances. The efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers.
  • Placeholder Image
    Publication
    A framework to assess sun salutation videos
    (2016-12-18)
    Jain, Hiteshi
    ;
    There are many exercises which are repetitive in nature and are required to be done with perfection to derive maximum benefits. Sun Salutation or Surya Namaskar is one of the oldest yoga practice known. It is a sequence of ten actions or 'asanas' where the actions are synchronized with breathing and each action and its transition should be performed with minimal jerks. Essentially, it is important that this yoga practice be performed with Grace and Consistency. In this context, Grace is the ability of a person to perform an exercise with smoothness i.e. without sudden movements or jerks during the posture transition and Consistency measures the repeatability of an exercise in every cycle. We propose an algorithm that assesses how well a person practices Sun Salutation in terms of grace and consistency. Our approach works by training individual HMMs for each asana using STIP features[11] followed by automatic segmentation and labeling of the entire Sun Salutation sequence using a concatenated-HMM. The metric of grace and consistency are then laid down in terms of posture transition times. The assessments made by our system are compared with the assessments of the yoga trainer to derive the accuracy of the system. We introduce a dataset for Sun Salutation videos comprising 30 sequences of perfect Sun Salutation performed by seven experts and used this dataset to train our system. While Sun Salutation can be judged on multiple parameters, we focus mainly on judging Grace and Consistency.
    Scopus© Citations 7
  • Placeholder Image
    Publication
    Action Quality Assessment Using Siamese Network-Based Deep Metric Learning
    (2021-06-01)
    Jain, Hiteshi
    ;
    ;
    Sharma, Avinash
    Automated vision-based score estimation models can be used to provide an alternate opinion to avoid judgment bias. Existing works have learned score estimation models by regressing the video representation to ground truth score provided by judges. However, such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video, which would capture the temporal variations vis-á-vis the reference video and map those variations to the final score. In this work, we propose a new action scoring system termed as Reference Guided Regression (RGR), which comprises (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and (2) a Score Estimation Module that uses the first module to find the resemblance of a video with a reference video to give the assessment score. The proposed scoring model is tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.
    Scopus© Citations 27
  • Placeholder Image
    Publication
    Unsupervised Temporal Segmentation of Human Action Using Community Detection
    (2018-08-29)
    Jain, Hiteshi
    ;
    Temporal segmentation of complex human action videos into action primitives plays a pivotal role in building models for human action understanding. Studies in the past have introduced unsupervised frameworks for deriving a known number of motion primitives from action videos. Our work focuses towards answering a question: Given a set of videos with humans performing an activity, can the action primitives be derived from them without specifying any prior knowledge about the count for the constituting sub-actions categories? To this end, we present a novel community detection-based human action segmentation algorithm. Our work marks the existence of community structures in human action videos where the consecutive frames around the key poses group together to form communities similar to social networks. We test our proposed technique over the stitched Weizmann dataset and MHADI01-s motion capture dataset and our technique outperforms the state-of-the-art techniques of complex action segmentation without the count of actions being pre-specified.
    Scopus© Citations 6
  • Placeholder Image
    Publication
    DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents
    (2020-01-01)
    Aggarwal, Ridhi
    ;
    Jain, Hiteshi
    ;
    ;
    This paper presents an Optical Character Recognition (OCR) system for documents with English text and mathematical expressions. Neural network architectures using CNN layers and/or dense layers achieve high level accuracy in character recognition task. However, these models require large amount of data to train the network, with balanced number of samples for each class. Recognition of mathematical symbols poses challenges of the imbalance and paucity of training data available. To address this issue, we pose the character recognition problem as a Distance Metric Learning problem. We propose a Siamese-CNN Network that learns discriminative features to identify if the two images in a pair contain similar or dissimilar characters. The network is then used to recognize different characters by character matching where test images are compared to sample images of any target class which may or may not be included during training. Thus our model can scale to new symbols easily. The proposed approach is invariant to author’s handwriting. Our model has been tested over images extracted from a dataset of scanned answer scripts collected by us. It is seen that our approach achieves comparable performance to other architectures using convolutional layers or dense layers while using lesser training data.
  • Placeholder Image
    Publication
    An unsupervised sequence-to-sequence autoencoder based human action scoring model
    (2019-11-01)
    Jain, Hiteshi
    ;
    Developing a model for the task of assessing quality of human action is a key research area in computer vision. The quality assessment task has been posed as a supervised regression problem, where models have been trained to predict score, given action representation features. However, human proficiency levels can widely vary and so do their scores. Providing all such performance variations and their respective scores is an expensive solution as it requires a domain expert to annotate many videos. The question arises - Can we exploit the variations of the performances from that of expert and map the variations to their respective scores? To this end, we introduce a novel sequence-to-sequence autoencoder-based scoring model which learns the representation from only expert performances and judges an unknown performance based on how well it can be regenerated from the learned model. We evaluated our model in predicting scores of a complex Sun- Salutation action sequence, and demonstrate that our model gives remarkable prediction accuracy compared to the baselines.
    Scopus© Citations 4
  • Placeholder Image
    Publication
    Detecting missed and anomalous action segments using approximate string matching algorithm
    (2018-01-01)
    Jain, Hiteshi
    ;
    We forget action steps and perform some unwanted action movements as amateur performers during our daily exercise routine, dance performances, etc. To improve our proficiency, it is important that we get a feedback on our performances in terms of where we went wrong. In this paper, we propose a framework for analyzing and issuing reports of action segments that were missed or anomalously performed. This involves comparing the performed sequence with the standard action sequence and notifying when misalignments occur. We propose an exemplar based Approximate String Matching (ASM) technique for detecting such anomalous and missing segments in action sequences. We compare the results with those obtained from the conventional Dynamic Time Warping (DTW) algorithm for sequence alignment. It is seen that the alignment of the action sequences under conventional DTW fails in the presence of missed action segments and anomalous segments due to its boundary condition constraints. The performance of the two techniques has been tested on a complex aperiodic human action dataset with Warm up exercise sequences that we developed from correct and incorrect executions by multiple people. The proposed ASM technique shows promising alignment and missed/anomalous notification results over this dataset.
    Scopus© Citations 7