Now showing 1 - 10 of 11
  • Placeholder Image
    Publication
    Word Spotting in Cluttered Environment
    (2020-01-01)
    Srivastava, Divya
    ;
    In this paper, we present a novel problem of handwritten word spotting in cluttered environment where a word is cluttered by a strike-through with a line stroke. These line strokes can be straight, slant, broken, continuous, or wavy in nature. Vertical Projection Profile (VPP) feature and its modified version, which is the combinatorics Vertical Projection Profile (cVPP) feature is extracted and aligned by modified Dynamic Time Warping (DTW) algorithm. The dataset for the proposed problem is not available so we prepared our dataset. We compare our method with Rath and Manmath [6], and PHOCNET [17] for handwritten word spotting in the presence of strike-through, and achieve better results.
    Scopus© Citations 1
  • Placeholder Image
    Publication
    Cell Extraction and Horizontal-Scale Correction in Structured Documents
    (2020-01-01)
    Srivastava, Divya
    ;
    Preprocessing techniques form an important task in document image analysis. In structured documents like forms, cheques, etc., there is a predefined space called frame field/cell for the user to fill the entry. When the user is writing, the nonuniformity of inter-character spacing becomes an issue. Many times, the starting characters of the word are written with sparse spacing between the characters and then gradually with a more compact spacing so as to accommodate the word within the frame field. To deal with this variation in intra-word spacing, horizontal-scale correction is applied to the extracted form fields. The effectiveness of the system is proved by applying it as a preprocessing step in a recognition system proposed in (Almazán et al. in Pattern Anal Mach Intell 36(12):21552–2566, 2014 [2]). The recognition framework results in reduced error rates with this normalization.
  • Placeholder Image
    Publication
    TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain
    (2023-01-01)
    Chakraborty, Sagar
    ;
    ;
    Ghosh, Saptarshi
    State-of-the-art offline Optical Character Recognition (OCR) frameworks perform poorly on semi-structured handwritten domain-specific documents due to their inability to localize and label form fields with domain-specific semantics. Existing techniques for semi-structured document analysis have primarily used datasets comprising invoices, purchase orders, receipts, and identity-card documents for benchmarking. In this work, we build the first semi-structured document analysis dataset in the legal domain by collecting a large number of First Information Report (FIR) documents from several police stations in India. This dataset, which we call the FIR dataset, is more challenging than most existing document analysis datasets, since it combines a wide variety of handwritten text with printed text. We also propose an end-to-end framework for offline processing of handwritten semi-structured documents, and benchmark it on our novel FIR dataset. Our framework used Encoder-Decoder architecture for localizing and labelling the form fields and for recognizing the handwritten content. The encoder consists of Faster-RCNN and Vision Transformers. Further the Transformer-based decoder architecture is trained with a domain-specific tokenizer. We also propose a post-correction method to handle recognition errors pertaining to the domain-specific terms. Our proposed framework achieves state-of-the-art results on the FIR dataset outperforming several existing models.
    Scopus© Citations 2
  • Placeholder Image
    Publication
    Survey of Structural Analysis in Mathematical Expression Recognition
    (2023-01-01)
    Aggarwal, Ridhi
    ;
    Pandey, Shilpa
    ;
    ;
    Automated identification of mathematical expressions (MEs) is essential in transforming scientific and engineering documents into electronic form. Even though character and symbol recognizers have achieved commendable performance for digitizing documents, structure analysers still face a challenge in correctly interpreting the maths expressions. This review paper compares the salient aspects of past works dealing with structure analysis of printed and handwritten MEs. To the best of our knowledge, no previous work has done a systematic study of structural analysis methods in mathematical expression recognition. We present distinguishing aspects of different grammars and their production rules for semantic parsing of ME. Our study contributes by providing information on the existing datasets, their desirable properties, different evaluation measures, distinguishing aspects of techniques used and future research directions in structural analysis.
  • Placeholder Image
    Publication
    Beyond visual semantics: Exploring the role of scene text in image understanding
    (2021-09-01)
    Dey, Arka Ujjal
    ;
    Ghosh, Suman K.
    ;
    Valveny, Ernest
    ;
    Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We not only extract and encode visual and scene text cues but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images with scene text content to demonstrate its effectiveness. In the retrieval framework, we augment the contextual semantic representation with scene text cues to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous scene text recognition, we also apply query-based attention to the text channel. We show that our multi-channel approach, involving contextual semantics and scene text, improves upon the absolute accuracy of the current state-of-the-art methods on Advertisement Images Dataset by 8.9% in the relevant statement retrieval task and by 5% in the topic classification task.
    Scopus© Citations 10
  • Placeholder Image
    Publication
    Action Quality Assessment Using Siamese Network-Based Deep Metric Learning
    (2021-06-01)
    Jain, Hiteshi
    ;
    ;
    Sharma, Avinash
    Automated vision-based score estimation models can be used to provide an alternate opinion to avoid judgment bias. Existing works have learned score estimation models by regressing the video representation to ground truth score provided by judges. However, such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video, which would capture the temporal variations vis-á-vis the reference video and map those variations to the final score. In this work, we propose a new action scoring system termed as Reference Guided Regression (RGR), which comprises (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and (2) a Score Estimation Module that uses the first module to find the resemblance of a video with a reference video to give the assessment score. The proposed scoring model is tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.
    Scopus© Citations 27
  • Placeholder Image
    Publication
    DocDescribor: Digits + Alphabets + Math Symbols - A Complete OCR for Handwritten Documents
    (2020-01-01)
    Aggarwal, Ridhi
    ;
    Jain, Hiteshi
    ;
    ;
    This paper presents an Optical Character Recognition (OCR) system for documents with English text and mathematical expressions. Neural network architectures using CNN layers and/or dense layers achieve high level accuracy in character recognition task. However, these models require large amount of data to train the network, with balanced number of samples for each class. Recognition of mathematical symbols poses challenges of the imbalance and paucity of training data available. To address this issue, we pose the character recognition problem as a Distance Metric Learning problem. We propose a Siamese-CNN Network that learns discriminative features to identify if the two images in a pair contain similar or dissimilar characters. The network is then used to recognize different characters by character matching where test images are compared to sample images of any target class which may or may not be included during training. Thus our model can scale to new symbols easily. The proposed approach is invariant to author’s handwriting. Our model has been tested over images extracted from a dataset of scanned answer scripts collected by us. It is seen that our approach achieves comparable performance to other architectures using convolutional layers or dense layers while using lesser training data.
  • Placeholder Image
    Publication
    Structural Analysis of Offline Handwritten Mathematical Expressions
    (2020-01-01)
    Aggarwal, Ridhi
    ;
    ;
    Structural analysis helps in parsing the mathematical expressions. Various approaches for structural analysis have been reported in literature, but they mainly deal with online and printed expressions. In this work, two-dimensional, stochastic context-free grammar is used for the structural analysis of offline handwritten mathematical expressions in a document image. The spatial relation between characters in an expression has been incorporated so that the structural variability in handwritten expressions can be tackled.
    Scopus© Citations 1
  • Placeholder Image
    Publication
    Simultaneous denoising and super resolution of document images
    (2024-03-01)
    Srivastava, Divya
    ;
    In this paper, we propose a unified approach for denoising and super-resolution of document images. The approach is a one shot unpaired technique where a single unpaired example is used as reference for training a SinGAN (Shaham et al., in: Proceedings of the IEEE/CVF international conference on computer vision, 2019) model. The training is carried out in 2 steps. First we use a clean reference image to train a SinGAN to learn the characteristics of the clean image. Then we perform super resolution and denoising of given test image using another SinGAN. Our unique formulation of the loss function helps in this task by prompting the generated images to have characteristics similar to the reference clean image. We conduct experiments on publicly available datasets (Kaggle Dirty Documents Images and DIBCO) and obtain promising results. We also evaluate the performance of our model for OCR and obtain a higher recognition rate compared to competing methods.
  • Placeholder Image
    Publication
    EKTVQA: Generalized Use of External Knowledge to Empower Scene Text in Text-VQA
    (2022-01-01)
    Dey, Arka Ujjal
    ;
    Valveny, Ernest
    ;
    The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely seen or completely unseen scene text content of an image. We address this zero-shot nature of the task by proposing the generalized use of external knowledge to augment our understanding of the scene text. We design a framework to extract, validate, and reason with knowledge using a standard multimodal transformer for vision language understanding tasks. Through empirical evidence and qualitative results, we demonstrate how external knowledge can highlight instance-only cues and thus help deal with training data bias, improve answer entity type correctness, and detect multiword named entities. We generate results comparable to the state-of-the-art on three publicly available datasets under the constraints of similar upstream OCR systems and training data.
    Scopus© Citations 1