Options
Harit, Gaurav
Loading...
Preferred name
Harit, Gaurav
Alternative Name
Harit, G.
HARIT G.A.U.R.A.V.
Harit G.
Main Affiliation
Email
ORCID
Scopus Author ID
Researcher ID
34 results
Now showing 1 - 10 of 34
- PublicationText detection on camera acquired document images using supervised classification of connected components in wavelet domain(2012-12-01)
;Roy, UditIn this paper we present an algorithm to detect text on video frames consisting of lecture slides. We begin by performing a multi-channel wavelet transform and then merge the channel components for the high frequency sub bands to obtain a composite energy map. Thresholding the energy map results in an edge map consisting of candidate text pixels - some of these correspond to actual text and others correspond to graphics, logo, tables, etc. The connected components in the edge map are then filtered to reject some of the false positives using a trained classifier. Rectangular text blocks compactly surrounding the text regions are then identified using a process of selective dilation and recursive splitting. False positive text blocks still remaining are then rejected using heuristics. Experiments conducted on 890 images show that our scheme has lower false positive rate and misdetection rate when compared with two existing scene text detection methods. © 2012 ICPR Org Committee. - PublicationLeveraging information from imperfect examples: Common action sequence mining from a mix of incorrect performances(2018-12-18)
;Jain, HiteshiAs much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions with the help of some imperfect performances. The efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers. - PublicationA framework to assess sun salutation videos(2016-12-18)
;Jain, HiteshiThere are many exercises which are repetitive in nature and are required to be done with perfection to derive maximum benefits. Sun Salutation or Surya Namaskar is one of the oldest yoga practice known. It is a sequence of ten actions or 'asanas' where the actions are synchronized with breathing and each action and its transition should be performed with minimal jerks. Essentially, it is important that this yoga practice be performed with Grace and Consistency. In this context, Grace is the ability of a person to perform an exercise with smoothness i.e. without sudden movements or jerks during the posture transition and Consistency measures the repeatability of an exercise in every cycle. We propose an algorithm that assesses how well a person practices Sun Salutation in terms of grace and consistency. Our approach works by training individual HMMs for each asana using STIP features[11] followed by automatic segmentation and labeling of the entire Sun Salutation sequence using a concatenated-HMM. The metric of grace and consistency are then laid down in terms of posture transition times. The assessments made by our system are compared with the assessments of the yoga trainer to derive the accuracy of the system. We introduce a dataset for Sun Salutation videos comprising 30 sequences of perfect Sun Salutation performed by seven experts and used this dataset to train our system. While Sun Salutation can be judged on multiple parameters, we focus mainly on judging Grace and Consistency.Scopus© Citations 7 - PublicationGradient sensitive kernel for image denoising, using Gaussian Process Regression(2016-06-10)
;Dey, Arka UjjalWe target the problem of Image Denoising using Gaussian Processes Regression (GPR). Being a non-parametric regression technique, GPR has received much attention in the recent past and here we further explore its versatility by applying it to a denoising problem. The focus is primarily on the design of a local gradient sensitive kernel that captures pixel similarity in the context of image denoising. This novel kernel formulation is used to shape the smoothness of the joint GP prior. We apply the GPR denoising technique to small patches and then stitch back these patches, this allows the priors to be local and relevant, also this helps us in dealing with GPR complexity. We demonstrate that our GPR based technique gives better PSNR values in comparison to existing popular denoising techniques.Scopus© Citations 1 - PublicationAn improved contour-based thinning method for character images(2011-10-15)
;Bag, SoumenDigital skeleton of character images, generated by thinning method, has a wide range of applications for shape analysis and classification. But thinning of character images is a big challenge. Removal of spurious strokes or deformities in thinning is a difficult problem. In this paper, we propose a contour-based thinning method used for performing skeletonization of printed noisy isolated character images. In this method, we use shape characteristics of text to get skeleton of nearly same as the true character shape. This approach helps to preserve the local features and true shapes of the character images. As a by-product of our thinning approach, the skeleton also gets segmented into strokes in vector form. Hence further stroke segmentation is not required. Experiment is done on printed English, Bengali, Hindi, and Tamil characters and we obtain much better results comparing with other thinning methods without any post-processing. © 2011 Elsevier B.V. All rights reserved.Scopus© Citations 22 - PublicationNearest neighbour classification of Indian sign language gestures using kinect camera(2016-02-01)
;ANSARI, Z. A.F.A.R.A.H.M.E.D.People with speech disabilities communicate in sign language and therefore have trouble in mingling with the able-bodied. There is a need for an interpretation system which could act as a bridge between them and those who do not know their sign language. A functional unobtrusive Indian sign language recognition system was implemented and tested on real world data. A vocabulary of 140 symbols was collected using 18 subjects, totalling 5041 images. The vocabulary consisted mostly of two-handed signs which were drawn from a wide repertoire of words of technical and daily-use origins. The system was implemented using Microsoft Kinect which enables surrounding light conditions and object colour to have negligible effect on the efficiency of the system. The system proposes a method for a novel, low-cost and easy-to-use application, for Indian Sign Language recognition, using the Microsoft Kinect camera. In the fingerspelling category of our dataset, we achieved above 90% recognition rates for 13 signs and 100% recognition for 3 signs with overall 16 distinct alphabets (A, B, D, E, F, G, H, K, P, R, T, U, W, X, Y, Z) recognised with an average accuracy rate of 90.68%.Scopus© Citations 37 - PublicationTable extraction from document images using fixed point model(2014-12-14)
;Bansal, Anukriti; Dutta Roy, SumantraThe paper presents a novel learning-based framework to identify tables from scanned document images. The approach is designed as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region. We develop features which encode the foreground block characteristics and the contextual information. These features are provided to a fixed point model which learns the inter-relationship between the blocks. The fixed point model attains a contraction mapping and provides a unique label to each block. We compare the results with Condition Random Fields(CRFs). Unlike CRFs, the fixed point model captures the context information in terms of the neighbourhood layout more efficiently. Experiments on the images picked from UW-III (University of Washington) dataset, UNLV dataset and our dataset consisting of document images with multi-column page layout, show the applicability of our algorithm in layout analysis and table detection.Scopus© Citations 8 - PublicationTopological features for recognizing printed and handwritten Bangla characters(2011-10-13)
;Bag, Soumen; Bhowmick, ParthaIn this paper, we present novel topological features based on the structural shape of a character. We detect the convexshaped segments formed by the various strokes. The convex segments are then represented with shape primitives from a repertoire. The character is represented as a spatial layout of convex segments. We formulate feature templates for Bangla characters. A given character is assigned the label of the best matching feature template. We have tested the method on a benchmark datasets of printed and handwritten Bangla basic and compound character images. Our results demonstrate the efficacy of our approach. Copyright © 2011 ACM. - PublicationDocument retrieval with unlimited vocabulary(2015-02-19)
;Ranjan, Viresh; Jawahar, C. V.In this paper, we describe a classifier based retrieval scheme for efficiently and accurately retrieving relevant documents. We use SVM classifiers for word retrieval, and argue that the classifier based solutions can be superior to the OCR based solutions in many practical situations. We overcome the practical limitations of the classifier based solution in terms of limited vocabulary support, and availability of training data. In order to overcome these limitations, we design a one-shot learning scheme for dynamically synthesizing classifiers. Given a set of SVM classifiers, we appropriately join them to create novel classifiers. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. We validate our method on multiple datasets, and compare it with popular alternatives like OCR and word spotting. Even on a language like English, where OCRs have been fairly advanced, our method yields comparable or even superior results. Our results are significant since we do not use any language specific post-processing for obtaining this performance. For better accuracy of the retrieved list, we use query expansion. This also allows us to seamlessly adapt our solution to new fonts, styles and collections.Scopus© Citations 5