Now showing 1 - 10 of 14
  • Placeholder Image
    Publication
    An improved contour-based thinning method for character images
    (2011-10-15)
    Bag, Soumen
    ;
    Digital skeleton of character images, generated by thinning method, has a wide range of applications for shape analysis and classification. But thinning of character images is a big challenge. Removal of spurious strokes or deformities in thinning is a difficult problem. In this paper, we propose a contour-based thinning method used for performing skeletonization of printed noisy isolated character images. In this method, we use shape characteristics of text to get skeleton of nearly same as the true character shape. This approach helps to preserve the local features and true shapes of the character images. As a by-product of our thinning approach, the skeleton also gets segmented into strokes in vector form. Hence further stroke segmentation is not required. Experiment is done on printed English, Bengali, Hindi, and Tamil characters and we obtain much better results comparing with other thinning methods without any post-processing. © 2011 Elsevier B.V. All rights reserved.
    Scopus© Citations 22
  • Placeholder Image
    Publication
    Nearest neighbour classification of Indian sign language gestures using kinect camera
    (2016-02-01)
    ANSARI, Z. A.F.A.R.A.H.M.E.D.
    ;
    People with speech disabilities communicate in sign language and therefore have trouble in mingling with the able-bodied. There is a need for an interpretation system which could act as a bridge between them and those who do not know their sign language. A functional unobtrusive Indian sign language recognition system was implemented and tested on real world data. A vocabulary of 140 symbols was collected using 18 subjects, totalling 5041 images. The vocabulary consisted mostly of two-handed signs which were drawn from a wide repertoire of words of technical and daily-use origins. The system was implemented using Microsoft Kinect which enables surrounding light conditions and object colour to have negligible effect on the efficiency of the system. The system proposes a method for a novel, low-cost and easy-to-use application, for Indian Sign Language recognition, using the Microsoft Kinect camera. In the fingerspelling category of our dataset, we achieved above 90% recognition rates for 13 signs and 100% recognition for 3 signs with overall 16 distinct alphabets (A, B, D, E, F, G, H, K, P, R, T, U, W, X, Y, Z) recognised with an average accuracy rate of 90.68%.
    Scopus© Citations 37
  • Placeholder Image
    Publication
    Skeletonizing character images using a modified medial axis-based strategy
    (2011-11-01)
    Bag, Soumen
    ;
    In this paper we propose a thinning methodology applicable to character images. It is novel in terms of its ability to adapt to local character shape while constructing the thinned skeleton. Our method does not produce many of the distortions in the character shapes which normally result from the use of existing thinning algorithms. The proposed thinning methodology is based on the medial axis of the character. The skeleton has a width of one pixel. As a by-product of our thinning approach, the skeleton also gets segmented into strokes in vector form. Hence further stroke segmentation is not required. We have conducted experiments with printed and handwritten characters in several scripts such as English, Bengali, Hindi, Kannada and Tamil. We obtain less spurious branches compared to other thinning methods. Our method does not use any kind of post processing. © 2011 World Scientific Publishing Company.
    Scopus© Citations 10
  • Placeholder Image
    Publication
    Survey of Structural Analysis in Mathematical Expression Recognition
    (2023-01-01)
    Aggarwal, Ridhi
    ;
    Pandey, Shilpa
    ;
    ;
    Automated identification of mathematical expressions (MEs) is essential in transforming scientific and engineering documents into electronic form. Even though character and symbol recognizers have achieved commendable performance for digitizing documents, structure analysers still face a challenge in correctly interpreting the maths expressions. This review paper compares the salient aspects of past works dealing with structure analysis of printed and handwritten MEs. To the best of our knowledge, no previous work has done a systematic study of structural analysis methods in mathematical expression recognition. We present distinguishing aspects of different grammars and their production rules for semantic parsing of ME. Our study contributes by providing information on the existing datasets, their desirable properties, different evaluation measures, distinguishing aspects of techniques used and future research directions in structural analysis.
  • Placeholder Image
    Publication
    Beyond visual semantics: Exploring the role of scene text in image understanding
    (2021-09-01)
    Dey, Arka Ujjal
    ;
    Ghosh, Suman K.
    ;
    Valveny, Ernest
    ;
    Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We not only extract and encode visual and scene text cues but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images with scene text content to demonstrate its effectiveness. In the retrieval framework, we augment the contextual semantic representation with scene text cues to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous scene text recognition, we also apply query-based attention to the text channel. We show that our multi-channel approach, involving contextual semantics and scene text, improves upon the absolute accuracy of the current state-of-the-art methods on Advertisement Images Dataset by 8.9% in the relevant statement retrieval task and by 5% in the topic classification task.
    Scopus© Citations 10
  • Placeholder Image
    Publication
    MOWL: An ontology representation language for web-based multimedia applications
    (2013-12-01)
    Mallik, Anupama
    ;
    Ghosh, Hiranmay
    ;
    ;
    Several multimedia applications need to reason with concepts and their media properties in specific domain contexts. Media properties of concepts exhibit some unique characteristics that cannot be dealt with conceptual modeling schemes followed in the existing ontology representation and reasoning schemes. We have proposed a new perceptual modeling technique for reasoning with media properties observed in multimedia instances and the latent concepts. Our knowledge representation scheme uses a causal model of the world where concepts manifest in media properties with uncertainties. We introduce a probabilistic reasoning scheme for belief propagation across domain concepts through observation of media properties. In order to support the perceptual modeling and reasoning paradigm, we propose a new ontology language, Multimedia Web Ontology Language (MOWL). Our primary contribution in this article is to establish the need for the new ontology language and to introduce the semantics of its novel language constructs. We establish the generality of our approach with two disperate knowledge-intensive applications involving reasoning with media properties of concepts. © 2013 ACM.
    Scopus© Citations 22
  • Placeholder Image
    Publication
    Guide Me: Recognition and Servoing on Mobiles
    (2018-12-01)
    Abdulhafez, Abdulhafez
    ;
    In this paper, we design and implement a human–machine interaction application, which enables a visually challenged person to locate and manipulate personal objects in her/his neighborhood. In this setting, we need to develop a tool (embedded in a mobile phone) which is capable of sensing, computing, and guiding the human arm toward the object. This involves solving the following two subproblems: (1) recognition of objects in the input images, and (2) generating control signals, to guide the human for navigation, to reach the desired destination. For the former subproblem, we adapt the bag-of-words framework for recognition and matching on mobile phones. For the latter subproblem, we have developed a moment-based human servoing algorithm which is able to generate commands that help the visually impaired human to localize his hand with respect to the object of interest. All necessary computations take place on the mobile phone. The proposed object recognition and vision-based control design are deployed on a low-/mid-end mobile phone. This can lead to a wide range of applications. With our proposed design and implementation, we demonstrate that our application is effective and accurate, with a high reliability of convergence for different experimental settings.
  • Placeholder Image
    Publication
    Action Quality Assessment Using Siamese Network-Based Deep Metric Learning
    (2021-06-01)
    Jain, Hiteshi
    ;
    ;
    Sharma, Avinash
    Automated vision-based score estimation models can be used to provide an alternate opinion to avoid judgment bias. Existing works have learned score estimation models by regressing the video representation to ground truth score provided by judges. However, such regression-based solutions lack interpretability in terms of giving reasons for the awarded score. One solution to make the scores more explicable is to compare the given action video with a reference video, which would capture the temporal variations vis-á-vis the reference video and map those variations to the final score. In this work, we propose a new action scoring system termed as Reference Guided Regression (RGR), which comprises (1) a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and (2) a Score Estimation Module that uses the first module to find the resemblance of a video with a reference video to give the assessment score. The proposed scoring model is tested for Olympics Diving and Gymnastic vaults and the model outperforms the existing state-of-the-art scoring models.
    Scopus© Citations 27
  • Placeholder Image
    Publication
    Recognition of Bangla compound characters using structural decomposition
    (2014-03-01)
    Bag, Soumen
    ;
    ;
    Bhowmick, Partha
    In this paper we propose a novel character recognition method for Bangla compound characters. Accurate recognition of compound characters is a difficult problem due to their complex shapes. Our strategy is to decompose a compound character into skeletal segments. The compound character is then recognized by extracting the convex shape primitives and using a template matching scheme. The novelty of our approach lies in the formulation of appropriate rules of character decomposition for segmenting the character skeleton into stroke segments and then grouping them for extraction of meaningful shape components. Our technique is applicable to both printed and handwritten characters. The proposed method performs well for complex-shaped compound characters, which were confusing to the existing methods. © 2013 Elsevier Ltd. All rights reserved.
    Scopus© Citations 35
  • Placeholder Image
    Publication
    Simultaneous denoising and super resolution of document images
    (2024-03-01)
    Srivastava, Divya
    ;
    In this paper, we propose a unified approach for denoising and super-resolution of document images. The approach is a one shot unpaired technique where a single unpaired example is used as reference for training a SinGAN (Shaham et al., in: Proceedings of the IEEE/CVF international conference on computer vision, 2019) model. The training is carried out in 2 steps. First we use a clean reference image to train a SinGAN to learn the characteristics of the clean image. Then we perform super resolution and denoising of given test image using another SinGAN. Our unique formulation of the loss function helps in this task by prompting the generated images to have characteristics similar to the reference clean image. We conduct experiments on publicly available datasets (Kaggle Dirty Documents Images and DIBCO) and obtain promising results. We also evaluate the performance of our model for OCR and obtain a higher recognition rate compared to competing methods.