Now showing 1 - 10 of 45
Placeholder Image
Publication

Preface

2019-01-01, Sundaram, Suresh, Harit, Gaurav

Placeholder Image
Publication

A framework to assess sun salutation videos

2016-12-18, Jain, Hiteshi, Harit, Gaurav

There are many exercises which are repetitive in nature and are required to be done with perfection to derive maximum benefits. Sun Salutation or Surya Namaskar is one of the oldest yoga practice known. It is a sequence of ten actions or 'asanas' where the actions are synchronized with breathing and each action and its transition should be performed with minimal jerks. Essentially, it is important that this yoga practice be performed with Grace and Consistency. In this context, Grace is the ability of a person to perform an exercise with smoothness i.e. without sudden movements or jerks during the posture transition and Consistency measures the repeatability of an exercise in every cycle. We propose an algorithm that assesses how well a person practices Sun Salutation in terms of grace and consistency. Our approach works by training individual HMMs for each asana using STIP features[11] followed by automatic segmentation and labeling of the entire Sun Salutation sequence using a concatenated-HMM. The metric of grace and consistency are then laid down in terms of posture transition times. The assessments made by our system are compared with the assessments of the yoga trainer to derive the accuracy of the system. We introduce a dataset for Sun Salutation videos comprising 30 sequences of perfect Sun Salutation performed by seven experts and used this dataset to train our system. While Sun Salutation can be judged on multiple parameters, we focus mainly on judging Grace and Consistency.

Placeholder Image
Publication

Word Spotting in Cluttered Environment

2020-01-01, Srivastava, Divya, Harit, Gaurav

In this paper, we present a novel problem of handwritten word spotting in cluttered environment where a word is cluttered by a strike-through with a line stroke. These line strokes can be straight, slant, broken, continuous, or wavy in nature. Vertical Projection Profile (VPP) feature and its modified version, which is the combinatorics Vertical Projection Profile (cVPP) feature is extracted and aligned by modified Dynamic Time Warping (DTW) algorithm. The dataset for the proposed problem is not available so we prepared our dataset. We compare our method with Rath and Manmath [6], and PHOCNET [17] for handwritten word spotting in the presence of strike-through, and achieve better results.

Placeholder Image
Publication

Topological features for recognizing printed and handwritten Bangla characters

2011-10-13, Bag, Soumen, Harit, Gaurav, Bhowmick, Partha

In this paper, we present novel topological features based on the structural shape of a character. We detect the convexshaped segments formed by the various strokes. The convex segments are then represented with shape primitives from a repertoire. The character is represented as a spatial layout of convex segments. We formulate feature templates for Bangla characters. A given character is assigned the label of the best matching feature template. We have tested the method on a benchmark datasets of printed and handwritten Bangla basic and compound character images. Our results demonstrate the efficacy of our approach. Copyright © 2011 ACM.

Placeholder Image
Publication

Text detection on camera acquired document images using supervised classification of connected components in wavelet domain

2012-12-01, Roy, Udit, Harit, Gaurav

In this paper we present an algorithm to detect text on video frames consisting of lecture slides. We begin by performing a multi-channel wavelet transform and then merge the channel components for the high frequency sub bands to obtain a composite energy map. Thresholding the energy map results in an edge map consisting of candidate text pixels - some of these correspond to actual text and others correspond to graphics, logo, tables, etc. The connected components in the edge map are then filtered to reject some of the false positives using a trained classifier. Rectangular text blocks compactly surrounding the text regions are then identified using a process of selective dilation and recursive splitting. False positive text blocks still remaining are then rejected using heuristics. Experiments conducted on 890 images show that our scheme has lower false positive rate and misdetection rate when compared with two existing scene text detection methods. © 2012 ICPR Org Committee.

Placeholder Image
Publication

Gradient sensitive kernel for image denoising, using Gaussian Process Regression

2016-06-10, Dey, Arka Ujjal, Harit, Gaurav

We target the problem of Image Denoising using Gaussian Processes Regression (GPR). Being a non-parametric regression technique, GPR has received much attention in the recent past and here we further explore its versatility by applying it to a denoising problem. The focus is primarily on the design of a local gradient sensitive kernel that captures pixel similarity in the context of image denoising. This novel kernel formulation is used to shape the smoothness of the joint GP prior. We apply the GPR denoising technique to small patches and then stitch back these patches, this allows the priors to be local and relevant, also this helps us in dealing with GPR complexity. We demonstrate that our GPR based technique gives better PSNR values in comparison to existing popular denoising techniques.

Placeholder Image
Publication

Nearest neighbour classification of Indian sign language gestures using kinect camera

2016-02-01, ANSARI, Z. A.F.A.R.A.H.M.E.D., HARIT, G. A.U.R.A.V.

People with speech disabilities communicate in sign language and therefore have trouble in mingling with the able-bodied. There is a need for an interpretation system which could act as a bridge between them and those who do not know their sign language. A functional unobtrusive Indian sign language recognition system was implemented and tested on real world data. A vocabulary of 140 symbols was collected using 18 subjects, totalling 5041 images. The vocabulary consisted mostly of two-handed signs which were drawn from a wide repertoire of words of technical and daily-use origins. The system was implemented using Microsoft Kinect which enables surrounding light conditions and object colour to have negligible effect on the efficiency of the system. The system proposes a method for a novel, low-cost and easy-to-use application, for Indian Sign Language recognition, using the Microsoft Kinect camera. In the fingerspelling category of our dataset, we achieved above 90% recognition rates for 13 signs and 100% recognition for 3 signs with overall 16 distinct alphabets (A, B, D, E, F, G, H, K, P, R, T, U, W, X, Y, Z) recognised with an average accuracy rate of 90.68%.

Placeholder Image
Publication

Leveraging information from imperfect examples: Common action sequence mining from a mix of incorrect performances

2018-12-18, Jain, Hiteshi, Harit, Gaurav

As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions with the help of some imperfect performances. The efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers.

Placeholder Image
Publication

An improved contour-based thinning method for character images

2011-10-15, Bag, Soumen, Harit, Gaurav

Digital skeleton of character images, generated by thinning method, has a wide range of applications for shape analysis and classification. But thinning of character images is a big challenge. Removal of spurious strokes or deformities in thinning is a difficult problem. In this paper, we propose a contour-based thinning method used for performing skeletonization of printed noisy isolated character images. In this method, we use shape characteristics of text to get skeleton of nearly same as the true character shape. This approach helps to preserve the local features and true shapes of the character images. As a by-product of our thinning approach, the skeleton also gets segmented into strokes in vector form. Hence further stroke segmentation is not required. Experiment is done on printed English, Bengali, Hindi, and Tamil characters and we obtain much better results comparing with other thinning methods without any post-processing. © 2011 Elsevier B.V. All rights reserved.

Placeholder Image
Publication

Table extraction from document images using fixed point model

2014-12-14, Bansal, Anukriti, Harit, Gaurav, Dutta Roy, Sumantra

The paper presents a novel learning-based framework to identify tables from scanned document images. The approach is designed as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region. We develop features which encode the foreground block characteristics and the contextual information. These features are provided to a fixed point model which learns the inter-relationship between the blocks. The fixed point model attains a contraction mapping and provides a unique label to each block. We compare the results with Condition Random Fields(CRFs). Unlike CRFs, the fixed point model captures the context information in terms of the neighbourhood layout more efficiently. Experiments on the images picked from UW-III (University of Washington) dataset, UNLV dataset and our dataset consisting of document images with multi-column page layout, show the applicability of our algorithm in layout analysis and table detection.