Search Research Outputs

Recent Additions
  • Publication
    Hypertext Databases and Data Mining
    (1999-06-01)
    Chakrabarti, Soumen
    The volume of unstructured text and hypertext data far exceeds that of structured data. Text and hypertext are used for digital libraries, product catalogs, reviews, newsgroups, medical reports, customer service reports, and the like. Currently measured in billions of dollars, the worldwide internet activity is expected to reach a trillion dollars by 2002. Database researchers have kept some cautious distance from this action. The goal of this tutorial is to expose database researchers to text and hypertext information retrieval (IR) and mining systems, and to discuss emerging issues in the overlapping areas of databases, hypertext, and data mining.
  • Publication
    Snakes and Sandwiches: Optimal Clustering Strategies for a Data Warehouse
    (1999-06-01)
    Jagadish, H. V.
    ;
    Lakshmanan, Laks V.S.
    ;
    Srivastava, Divesh
    Physical layout of data is a crucial determinant of performance in a data warehouse. The optimal clustering of data on disk, for minimizing expected I/O, depends on the query workload. In practice, we often have a reasonable sense of the likelihood of different classes of queries, e.g., 40% of the queries concern calls made from some specific telephone number in some month. In this paper, we address the problem of finding an optimal clustering of records of a fact table on disk, given an expected workload in the form of a probability distribution over query classes.Attributes in a data warehouse fact table typically have hierarchies defined on them (by means of auxiliary dimension tables). The product of the dimensional hierarchy levels forms a lattice and leads to a natural notion of query classes. Optimal clustering in this context is a combinatorially explosive problem with a huge search space (doubly exponential in number of hierarchy levels). We identify an important subclass of clustering strategies called lattice paths, and present a dynamic programming algorithm for finding the optimal lattice path clustering, in time linear in the lattice size. We additionally propose a technique called snaking, which when applied to a lattice path, always reduces its cost. For a representative class of star schemas, we show that for every workload, there is a snaked lattice path which is globally optimal. Further, we prove that the clustering obtained by applying snaking to the optimal lattice path is never much worse than the globally optimal snaked lattice path clustering. We complement our analyses and validate the practical utility of our techniques with experiments using TPC-D benchmark data.
  • Publication
    Querying Network Directories
    (1999-06-01)
    Jagadish, H. V.
    ;
    Lakshmanan, Laks V.S.
    ;
    Milo, Tova
    ;
    Srivastava, Divesh
    ;
    Vista, Dimitra
    Heirarchically structured directories have recently proliferated with the growth of the Internet, and are being used to store not only address books and contact information for people, but also personal profiles, network resource information, and network and service policies. These systems provide a means for managing scale and heterogeneity, while allowing for conceptual unity and autonomy across multiple directory servers in the network, in a way for superior to what conventional relational or object-oriented databases offer. Yet, in deployed systems today, much of the data is modeled in an ad hoc manner, and many of the more sophisticated "queries"involve navigational access.In this paper, we develop the core of a formal data model for network directories, and propose a sequence of efficiently computable query languages with increasing expressive power. The directory data model can naturally represent rich forms of heterogeneity exhibited in the real world. Answers to queries expressible in our query languages can exhibit the same kinds of heterogeneity. We present external memory algorithms for the evaluation of queries posed in our directory query languages, and prove the efficiency of each algorithm in terms of its I/O complexity. Our data model and query languages share the flexibility and utility of the recent proposals for semi-structured data models, while at the same time effectively addressing the specific needs of network directory applications, which we demonstrate by means of a representative real-life example.
  • Publication
    Optimization of Constrained Frequent Set Queries with 2-variable Constraints
    (1999-06-01)
    Lakshmanan, Laks V.S.
    ;
    Ng, Raymond
    ;
    Han, Jiawei
    ;
    Pang, Alex
    Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, introduced the notion of constrained frequent set queries (CFQs), and developed effective pruning optimizations for CFQs with 1-variable (1-var) constraints.While 1-var constraints are useful for constraining the antecedent and consequent separately, many natural examples of CFQs illustrate the need for constraining the antecedent and consequent jointly, for which 2-variable (2-var) constraints are indispensable. Developing pruning optimizations for CFQs with 2-var constraints is the subject of this paper. But this is a difficult problem because: (i) in 2-var constraints, both variables keep changing and, unlike 1-var constraints, there is no fixed target for pruning; (ii) as we show, "conventional"monotonicity-based optimization techniques do not apply effectively to 2-var constraints.The contributions are as follows. (1) We introduce a notion of quasi-succinctness, which allows a quasi-succinct 2-var constraint to be reduced to two succinct 1-var constraints for pruning. (2) We characterize the class of 2-var constraints that are quasi-succinct. (3) We develop heuristic techniques for non-quasi-succinct constraints. Experimental results show the effectiveness of all our techniques. (4) We propose a query optimizer for CFQs and show that for a large class of constraints, the computation strategy generated by the optimizer is ccc-optimal, i.e., minimizing the effort incurred w.r.t. constraint checking and support counting.
  • Publication
    Turbo-charging Vertical Mining of Large Databases
    (2000-01-01)
    Shenoy, Pradeep
    ;
    Bhalotia, Gaurav
    ;
    Haritsa, Jayant R.
    ;
    Bawa, Mayank
    ;
    Sudarshan, S.
    ;
    Shah, Devavrat
    In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular character istics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called "snakes" and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and hor izontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practi cally infeasible, horizontal mining algorithm.
Most viewed
  • Publication
    On Mechanical, Physical, and Bioactivity Characteristics of Material Extrusion Printed Polyether Ether Ketone
    (2023-07-01)
    Kumar, Ranvijay
    ;
    Singh, Gurminder
    ;
    Chinappan, Amutha
    ;
    Ghomi, Erfan Rezvani
    ;
    Singh, Sunpreet
    ;
    Sandhu, Kamalpreet
    ;
    Ramakrishna, Seeram
    ;
    Narayan, Roger
    ;
    Katakam, Prakash
    High-performance polyether ether ketone (PEEK) thermoplastic is considered to be one of the most desirable materials for its intended biomedical implications, including oral implantology, prosthodontics, dental implants, and orthopaedics. Therefore, the processing of PEEK through material extrusion (ME) as a 3D printing process has been preferred due to its affordability, better process parameters, and mass customization. In the present study, attempts have been made to study the effects of various input process parameters of an in-house modified ME system on tensile strength, surface finish, and bioactivity. Underlining the scientific importance of input process parameters of ME, including nozzle temperature (Nt), printing speed (Ps), layer thickness (Lt), and build-platform temperature (Bt), their effects on the aforementioned characteristics of 3D printed PEEK specimens have been studied through employing Taguchi’s statistical analysis. The in-vitro cell viability test has been performed using Sprague–Dawley rat bone marrow-derived cells for 21 days. In addition to this, Scanning electron microscopic analysis has also been performed at various stages of this experimental study for supporting micro-characterization. This study indicated that the selected input process parameters strongly influence the tensile strength and surface finish of the as-printed specimens. The optimized print setting advised by the genetic algorithm (GA) included: Nt-440 °C, Ps-10 mm/min, Lt-0.1 mm, and Bt-270 °C. Further, the in-vitro results confirmed the bioactivity of the printed PEEK specimens with the tendency of cell viability. The novelty of the work is to develop a statistical model between ME parameters for PEEK between surface finish and tensile strength and to verify the bioactivity of the printed parts.
  • Publication
    Dynamics of Methyl Radical Formation Following 266 nm Dissociative Photoionization of Xylenes and Mesitylene
    (2022-03-31)
    Bejoy, Namitha Brijit
    ;
    Kawade, Monali
    ;
    Singh, Sumitra
    ;
    Patwari, G. Naresh
    The 266 nm dissociative photoionization of three xylene isomers and mesitylene leading to the formation of methyl radical was examined. The total translational energy distribution profiles [P(ET)] for the methyl radical were almost identical for all of the three isomers of xylene and mesitylene, while a substantial difference was observed for the corresponding P(ET) profile of the co-fragment produced by loss of one methyl group in m-xylene. This observation is attributed to the formation of the methyl radical from alternate channels induced by the probe. The P(ET) profiles were rationalized based on the dissociation of {sp2}C-C{sp3} bond in the cationic state, wherein the {sp2}C-C{sp3} bond dissociation energy is substantially lower relative to the neutral ground state. The dissociation in the cationic state follows a resonant three-photon absorption process, resulting in a maximum translational energy of about 1.6-1.8 eV for the photofragments in the center-of-mass frame. Fitting of the P(ET) profiles to empirical function reveals that the dynamics of {sp2}C-C{sp3} bond dissociation is insensitive to the position of substitution but marginally dependent on the number of methyl groups.
  • Publication
    Catapulting of topological defects through elasticity bands in active nematics
    (2022-06-28)
    Kumar, Nitin
    ;
    Zhang, Rui
    ;
    Redford, Steven A.
    ;
    de Pablo, Juan J.
    ;
    Gardel, Margaret L.
    Active materials are those in which individual, uncoordinated local stresses drive the material out of equilibrium on a global scale. Examples of such assemblies can be seen across scales from schools of fish to the cellular cytoskeleton and underpin many important biological processes. Synthetic experiments that recapitulate the essential features of such active systems have been the object of study for decades as their simple rules allow us to elucidate the physical underpinnings of collective motion. One system of particular interest has been active nematic liquid crystals (LCs). Because of their well understood passive physics, LCs provide a rich platform to interrogate the effects of active stress. The flows and steady state structures that emerge in an active LCs have been understood to result from a competition between nematic elasticity and the local activity. However most investigations of such phenomena consider only the magnitude of the elastic resistance and not its peculiarities. Here we investigate a nematic liquid crystal and selectively change the ratio of the material's splay and bend elasticities. We show that increases in the nematic's bend elasticity specifically drives the material into an exotic steady state where elongated regions of acute bend distortion or “elasticity bands” dominate the structure and dynamics. We show that these bands strongly influence defect dynamics, including the rapid motion or “catapulting” along the disintegration of one of these bands thus converting bend distortion into defect transport. Thus, we report a novel dynamical state resultant from the competition between nematic elasticity and active stress.
  • Publication
    Comprehensive Workflow of Mass Spectrometry-based Shotgun Proteomics of Tissue Samples
    (2021-11-13)
    Verma, Ayushi
    ;
    Kumar, Vipin
    ;
    Ghantasala, Saicharan
    ;
    Mukherjee, Shuvolina
    ;
    Srivastava, Sanjeeva
    Recent advances in mass spectrometry have resulted in deep proteomic analysis along with the generation of robust and reproducible datasets. However, despite the considerable technical advancements, sample preparation from biospecimens such as patient blood, CSF, and tissue still poses considerable challenges. For identifying biomarkers, tissue proteomics often provides an attractive sample source to translate the research findings from the bench to the clinic. It can reveal potential candidate biomarkers for early diagnosis of cancer and neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, etc. Tissue proteomics also yields a wealth of systemic information based on the abundance of proteins and helps to address interesting biological questions. Quantitative proteomics analysis can be grouped into two broad categories: a label-based and a label-free approach. In the label-based approach, proteins or peptides are labeled using stable isotopes such as SILAC (stable isotope labeling with amino acids in cell culture) or by chemical tags such as ICAT (isotope-coded affinity tags), TMT (tandem mass tag) or iTRAQ (isobaric tag for relative and absolute quantitation). Label-based approaches have the advantage of more accurate quantitation of proteins and using isobaric labels, multiple samples can be analyzed in a single experiment. The label-free approach provides a cost-effective alternative to label-based approaches. Hundreds of patient samples belonging to a particular cohort can be analyzed and compared with other cohorts based on clinical features. Here, we have described an optimized quantitative proteomics workflow for tissue samples using label-free and label-based proteome profiling methods, which is crucial for applications in life sciences, especially biomarker discovery-based projects.
  • Publication
    Geochemical evaluation of fluoride and nitrate contamination in groundwater using graphical method, Karnataka state, India
    (2022-06-09)
    Rizvi, Syed Shams
    ;
    Khader, Mohammed Aslam Mohammad Abdul
    ;
    Bala, Rashmi
    ;
    Nallusamy, Babu
    ;
    Khan, Mohammed Muqtada Ali
    ;
    Mansor, Hafzan Eva
    ;
    James, Elvaene
    ;
    Shamsuddin, Mohd Khairul Nizar
    Nitrate and fluoride contamination in groundwater is one of the major emerging problems of the Northern- Karnataka region of India. Fluoride pollutants in groundwater are responsible for dental and skeletal fluorosis whereas nitrate contamination is highly responsible for methemoglobinemia (blue baby disease), respiratory system, kidney, and thyroid in children as well as adults. The study has been carried out to identify the geochemical processes and mechanism responsible for releasing nitrate and fluoride pollutions in the region. Sampling has been done in both monsoon periods such as pre-monsoon and post-monsoon seasons. Fluoride concentration varies from 0.65 mg/L to 1.97 mg/L in pre-monsoon whereas the nitrate concentrations vary from 12 mg/L to 96 mg/L. The post-monsoon sample shows that the fluoride and nitrate vary from 0.76 to 1.53 mg/L and 16 mg/L to 77 mg/L respectively. The worst fluoride-affected villages are Jidga, Kamanalli, Padsawli, Sakkarga, and Nagelagaon. Several diseases such as dental and skeletal fluorosis have been identified in the region. Intertrappean beds and enhance the rate of weathering of fluoride bearing minerals may be responsible for fluoride contamination whereas excess use of fertilizer is generally responsible for nitrate contamination in the region.