Now showing 1 - 10 of 19
  • Placeholder Image
    Publication
    Text detection on camera acquired document images using supervised classification of connected components in wavelet domain
    (2012-12-01)
    Roy, Udit
    ;
    In this paper we present an algorithm to detect text on video frames consisting of lecture slides. We begin by performing a multi-channel wavelet transform and then merge the channel components for the high frequency sub bands to obtain a composite energy map. Thresholding the energy map results in an edge map consisting of candidate text pixels - some of these correspond to actual text and others correspond to graphics, logo, tables, etc. The connected components in the edge map are then filtered to reject some of the false positives using a trained classifier. Rectangular text blocks compactly surrounding the text regions are then identified using a process of selective dilation and recursive splitting. False positive text blocks still remaining are then rejected using heuristics. Experiments conducted on 890 images show that our scheme has lower false positive rate and misdetection rate when compared with two existing scene text detection methods. © 2012 ICPR Org Committee.
  • Placeholder Image
    Publication
    Leveraging information from imperfect examples: Common action sequence mining from a mix of incorrect performances
    (2018-12-18)
    Jain, Hiteshi
    ;
    As much as good representation and theory are needed to explain human actions, so are the action videos used for learning good segmentation techniques. To accurately model complex actions such as diving, figure skating, and yoga practices, videos depicting action by human experts are required. Lack of experts in any domain leads to reduced number of videos and hence an improper learning. In this work we attempt to utilize imperfect amateur performances to get more confident representations of human action sequences. We introduce a novel Community Detection based unsupervised framework that provides mechanisms to interpret video data and address its limitations to produce better action representation. Human actions are composed of distinguishable key poses which form dense communities in graph structures. Anomalous poses performed for a longer duration can also form such dense communities but can be identified based on their rare occurrence across action videos and be rejected. Further, we propose a technique to learn the temporal order of these key poses from these imperfect videos, where the inter community links help reduce the search space of many possible pose sequences. Our framework is seen to improve the segmentation performance of complex human actions with the help of some imperfect performances. The efficacy of our approach has been illustrated over two complex action datasets - Sun Salutation and Warm-up exercise, that have been developed using random executions from amateur performers.
  • Placeholder Image
    Publication
    A framework to assess sun salutation videos
    (2016-12-18)
    Jain, Hiteshi
    ;
    There are many exercises which are repetitive in nature and are required to be done with perfection to derive maximum benefits. Sun Salutation or Surya Namaskar is one of the oldest yoga practice known. It is a sequence of ten actions or 'asanas' where the actions are synchronized with breathing and each action and its transition should be performed with minimal jerks. Essentially, it is important that this yoga practice be performed with Grace and Consistency. In this context, Grace is the ability of a person to perform an exercise with smoothness i.e. without sudden movements or jerks during the posture transition and Consistency measures the repeatability of an exercise in every cycle. We propose an algorithm that assesses how well a person practices Sun Salutation in terms of grace and consistency. Our approach works by training individual HMMs for each asana using STIP features[11] followed by automatic segmentation and labeling of the entire Sun Salutation sequence using a concatenated-HMM. The metric of grace and consistency are then laid down in terms of posture transition times. The assessments made by our system are compared with the assessments of the yoga trainer to derive the accuracy of the system. We introduce a dataset for Sun Salutation videos comprising 30 sequences of perfect Sun Salutation performed by seven experts and used this dataset to train our system. While Sun Salutation can be judged on multiple parameters, we focus mainly on judging Grace and Consistency.
    Scopus© Citations 7
  • Placeholder Image
    Publication
    Gradient sensitive kernel for image denoising, using Gaussian Process Regression
    (2016-06-10)
    Dey, Arka Ujjal
    ;
    We target the problem of Image Denoising using Gaussian Processes Regression (GPR). Being a non-parametric regression technique, GPR has received much attention in the recent past and here we further explore its versatility by applying it to a denoising problem. The focus is primarily on the design of a local gradient sensitive kernel that captures pixel similarity in the context of image denoising. This novel kernel formulation is used to shape the smoothness of the joint GP prior. We apply the GPR denoising technique to small patches and then stitch back these patches, this allows the priors to be local and relevant, also this helps us in dealing with GPR complexity. We demonstrate that our GPR based technique gives better PSNR values in comparison to existing popular denoising techniques.
    Scopus© Citations 1
  • Placeholder Image
    Publication
    Table extraction from document images using fixed point model
    (2014-12-14)
    Bansal, Anukriti
    ;
    ;
    Dutta Roy, Sumantra
    The paper presents a novel learning-based framework to identify tables from scanned document images. The approach is designed as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region. We develop features which encode the foreground block characteristics and the contextual information. These features are provided to a fixed point model which learns the inter-relationship between the blocks. The fixed point model attains a contraction mapping and provides a unique label to each block. We compare the results with Condition Random Fields(CRFs). Unlike CRFs, the fixed point model captures the context information in terms of the neighbourhood layout more efficiently. Experiments on the images picked from UW-III (University of Washington) dataset, UNLV dataset and our dataset consisting of document images with multi-column page layout, show the applicability of our algorithm in layout analysis and table detection.
    Scopus© Citations 8
  • Placeholder Image
    Publication
    Topological features for recognizing printed and handwritten Bangla characters
    (2011-10-13)
    Bag, Soumen
    ;
    ;
    Bhowmick, Partha
    In this paper, we present novel topological features based on the structural shape of a character. We detect the convexshaped segments formed by the various strokes. The convex segments are then represented with shape primitives from a repertoire. The character is represented as a spatial layout of convex segments. We formulate feature templates for Bangla characters. A given character is assigned the label of the best matching feature template. We have tested the method on a benchmark datasets of printed and handwritten Bangla basic and compound character images. Our results demonstrate the efficacy of our approach. Copyright © 2011 ACM.
  • Placeholder Image
    Publication
    Document retrieval with unlimited vocabulary
    (2015-02-19)
    Ranjan, Viresh
    ;
    ;
    Jawahar, C. V.
    In this paper, we describe a classifier based retrieval scheme for efficiently and accurately retrieving relevant documents. We use SVM classifiers for word retrieval, and argue that the classifier based solutions can be superior to the OCR based solutions in many practical situations. We overcome the practical limitations of the classifier based solution in terms of limited vocabulary support, and availability of training data. In order to overcome these limitations, we design a one-shot learning scheme for dynamically synthesizing classifiers. Given a set of SVM classifiers, we appropriately join them to create novel classifiers. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. We validate our method on multiple datasets, and compare it with popular alternatives like OCR and word spotting. Even on a language like English, where OCRs have been fairly advanced, our method yields comparable or even superior results. Our results are significant since we do not use any language specific post-processing for obtaining this performance. For better accuracy of the retrieved list, we use query expansion. This also allows us to seamlessly adapt our solution to new fonts, styles and collections.
    Scopus© Citations 5
  • Placeholder Image
    Publication
    Core Region Detection for Off-Line Unconstrained Handwritten Latin Words Using Word Envelops
    (2017-07-02)
    Pandey, Shilpa
    ;
    Zone extraction is acclaimed as a significant pre-processing step in handwriting analysis. This paper presents a new method for separating ascenders and descenders from an unconstrained handwritten word and identifying its core-region. The method estimates correct core-region for complexities like long horizontal strokes, skewed words, first letter capital, hill and dale writing, jumping baselines and words with long descender curves, cursive handwriting, calligraphic words, title case words, very short words as shown in Fig. 1. It extracts two envelops from the word image and selects sample points that constitute the core region envelop. The method is tested on CVL, ICDAR-2013, ICFHR-2012, and IAM benchmark datasets of handwritten words written by multiple writers. We also created our own dataset of 100 words authored by 2 writers comprising all the above mentioned handwriting complexities. Due to non-availability of the Ground Truth for core-region extraction we created it manually for all the datasets. Our work reports an accuracy of 90.16% for correctly identifying all the three zones on 17,100 Latin words written by 802 individuals. Promising results are obtained by our core-region detection method when compared with the current state of the art methods.
  • Placeholder Image
    Publication
    Generating synthetic handwriting using n-gram letter glyphs
    (2016-12-18)
    Dey, Arka Ujjal
    ;
    We propose a framework for synthesis of natural semi cursive handwritten Latin script that can find application in text personalization, or in generation of synthetic data for recognition systems. Our method is based on the generation of synthetic n-gram letter glyphs and their subsequent concatenation. We propose a non-parametric data driven generation scheme that is able to mimic the variation observed in handwritten glyph samples to synthesize natural looking synthetic glyphs. These synthetic glyphs are then stitched together to form complete words, using a spline based concatenation scheme. Further, as a refinement, our method is able to generate pen-lifts, giving our results a natural semicursive look. Through subjective experiments and detailed analysis of the results, we demonstrate the effectiveness of our formulation in being able to generate natural looking synthetic script.
  • Placeholder Image
    Publication
    Associating field components in heterogeneous handwritten form images using Graph Autoencoder
    (2019-01-01)
    Srivastava, Divya
    ;
    We propose a graph-based deep network for predicting the associations pertaining to field labels and field values in heterogeneous handwritten form images. We consider forms in which the field label comprises printed text and field value can be the handwritten text. Inspired by the relationship predicting capability of the graphical models, we use a Graph Autoencoder to perform the intended field label to field value association in a given form image. To the best of our knowledge, it is the first attempt to perform label-value association in a handwritten form image using a machine learning approach. We have prepared our handwritten form image dataset comprising 300 images from 30 different templates having 10 images per template. Our framework is experimented on different network parameter and has shown promising results.
    Scopus© Citations 1