Publications
Permanent URI for this collection
Browse
Browsing Publications by Issue Date
Now showing 1 - 20 of 3518
Results Per Page
Sort Options
- PublicationSnakes and Sandwiches: Optimal Clustering Strategies for a Data Warehouse(1999-06-01)
;Jagadish, H. V. ;Lakshmanan, Laks V.S.Srivastava, DiveshPhysical layout of data is a crucial determinant of performance in a data warehouse. The optimal clustering of data on disk, for minimizing expected I/O, depends on the query workload. In practice, we often have a reasonable sense of the likelihood of different classes of queries, e.g., 40% of the queries concern calls made from some specific telephone number in some month. In this paper, we address the problem of finding an optimal clustering of records of a fact table on disk, given an expected workload in the form of a probability distribution over query classes.Attributes in a data warehouse fact table typically have hierarchies defined on them (by means of auxiliary dimension tables). The product of the dimensional hierarchy levels forms a lattice and leads to a natural notion of query classes. Optimal clustering in this context is a combinatorially explosive problem with a huge search space (doubly exponential in number of hierarchy levels). We identify an important subclass of clustering strategies called lattice paths, and present a dynamic programming algorithm for finding the optimal lattice path clustering, in time linear in the lattice size. We additionally propose a technique called snaking, which when applied to a lattice path, always reduces its cost. For a representative class of star schemas, we show that for every workload, there is a snaked lattice path which is globally optimal. Further, we prove that the clustering obtained by applying snaking to the optimal lattice path is never much worse than the globally optimal snaked lattice path clustering. We complement our analyses and validate the practical utility of our techniques with experiments using TPC-D benchmark data. - PublicationHypertext Databases and Data Mining(1999-06-01)Chakrabarti, SoumenThe volume of unstructured text and hypertext data far exceeds that of structured data. Text and hypertext are used for digital libraries, product catalogs, reviews, newsgroups, medical reports, customer service reports, and the like. Currently measured in billions of dollars, the worldwide internet activity is expected to reach a trillion dollars by 2002. Database researchers have kept some cautious distance from this action. The goal of this tutorial is to expose database researchers to text and hypertext information retrieval (IR) and mining systems, and to discuss emerging issues in the overlapping areas of databases, hypertext, and data mining.
- PublicationQuerying Network Directories(1999-06-01)
;Jagadish, H. V. ;Lakshmanan, Laks V.S. ;Milo, Tova ;Srivastava, DiveshVista, DimitraHeirarchically structured directories have recently proliferated with the growth of the Internet, and are being used to store not only address books and contact information for people, but also personal profiles, network resource information, and network and service policies. These systems provide a means for managing scale and heterogeneity, while allowing for conceptual unity and autonomy across multiple directory servers in the network, in a way for superior to what conventional relational or object-oriented databases offer. Yet, in deployed systems today, much of the data is modeled in an ad hoc manner, and many of the more sophisticated "queries"involve navigational access.In this paper, we develop the core of a formal data model for network directories, and propose a sequence of efficiently computable query languages with increasing expressive power. The directory data model can naturally represent rich forms of heterogeneity exhibited in the real world. Answers to queries expressible in our query languages can exhibit the same kinds of heterogeneity. We present external memory algorithms for the evaluation of queries posed in our directory query languages, and prove the efficiency of each algorithm in terms of its I/O complexity. Our data model and query languages share the flexibility and utility of the recent proposals for semi-structured data models, while at the same time effectively addressing the specific needs of network directory applications, which we demonstrate by means of a representative real-life example. - PublicationOptimization of Constrained Frequent Set Queries with 2-variable Constraints(1999-06-01)
;Lakshmanan, Laks V.S. ;Ng, Raymond ;Han, JiaweiPang, AlexCurrently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, introduced the notion of constrained frequent set queries (CFQs), and developed effective pruning optimizations for CFQs with 1-variable (1-var) constraints.While 1-var constraints are useful for constraining the antecedent and consequent separately, many natural examples of CFQs illustrate the need for constraining the antecedent and consequent jointly, for which 2-variable (2-var) constraints are indispensable. Developing pruning optimizations for CFQs with 2-var constraints is the subject of this paper. But this is a difficult problem because: (i) in 2-var constraints, both variables keep changing and, unlike 1-var constraints, there is no fixed target for pruning; (ii) as we show, "conventional"monotonicity-based optimization techniques do not apply effectively to 2-var constraints.The contributions are as follows. (1) We introduce a notion of quasi-succinctness, which allows a quasi-succinct 2-var constraint to be reduced to two succinct 1-var constraints for pruning. (2) We characterize the class of 2-var constraints that are quasi-succinct. (3) We develop heuristic techniques for non-quasi-succinct constraints. Experimental results show the effectiveness of all our techniques. (4) We propose a query optimizer for CFQs and show that for a large class of constraints, the computation strategy generated by the optimizer is ccc-optimal, i.e., minimizing the effort incurred w.r.t. constraint checking and support counting. - PublicationTurbo-charging Vertical Mining of Large Databases(2000-01-01)
;Shenoy, Pradeep ;Bhalotia, Gaurav ;Haritsa, Jayant R. ;Bawa, Mayank ;Sudarshan, S.Shah, DevavratIn a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular character istics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called "snakes" and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and hor izontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practi cally infeasible, horizontal mining algorithm. - PublicationOn-line Reorganization in Object Databases(2000-01-01)
;Lakhamraju, Mohana K. ;Rastogi, Rajeev ;Seshadri, S.Sudarshan, S.Reorganization of objects in an object databases is an important component of several operations like compaction, clustering, and schema evolution. The high availability requirements (24 X 7 operation) of certain application domains requires reorganization to be performed on-line with minimal intereference to concurrently executing transactions. In this paper, we address the problem of on-line reorganization in object databases, where a set of objects have to be migrated from one location to another. Specifically, we consider the case where objects in the database may contain physical references to other objects. Relocating an object in this case involves finding the set of objects (parents) that refer to it, and modifying the references in each parent. We propose an algorithm called the Incremental Reorganization Algorithm (IRA) that achieves the above task with minimal interference to concurrently executing transactions. The IRA algorithm holds locks on at most two distinct objects at any point of time. We have implemented IRA on Brahma, a storage manager developed at IIT Bombay, and conducted an extensive performance study. Our experiments reveal that IRA makes on-line reorganization feasible, with very little impact on the response times of concurrently executing transactions and on overall system throughput. We also describe how the IRA algorithm can handle system failures. - Publication
- PublicationSupporting Behavioral Contracts for COM Components(2001-01-01)
;Bhagat, SonalJoshi, Rushikesh K.Specifying behavioral specifications for components apart from the conventional syntactic interface specifications can be very useful in component based system development. Preconditions and postconditions describe one form of behavioral aspects of components. We discuss a tool and an implementation mechanism to incorporate behavioral contracts expressed in terms of preconditions and postconditions for COM components. A method invocation on a component is executed only if the precondition is satisfied. Similarly, the results are successfully returned upon successful execution of the postcondition. A design criterion was to facilitate contract specifications for existing components with least amount of changes at client and server side code. The tool requires that the component should implement an additional interface called IAccess if the behavioral contract needs component state. No modification is required to existing clients of the component. - PublicationProxy-Based Acceleration of Dynamically Generated Content on the World Wide Web: An Approach and Implementation(2002-06-03)
;Datta, Anindya ;Dutta, Kaushik ;Thomas, Helen ;Vandermeer, Debra ;Suresha,Ramamritham, KrithiAs Internet traffic continues to grow and web sites become increasingly complex, performance and scalability are major issues for web sites. Web sites are increasingly relying on dynamic content generation applications to provide web site visitors with dynamic, interactive, and personalized experiences. However, dynamic content generation comes at a cost - - each request requires computation as well as communication across multiple components.To address these issues, various dynamic content caching approaches have been proposed. Proxy-based caching approaches store content at various locations outside the site infrastructure and can improve Web site performance by reducing content generation delays, firewall processing delays, and bandwidth requirements. However, existing proxy-based caching approaches either (a) cache at the page level, which does not guarantee that correct pages are served and provides very limited reusability, or (b) cache at the fragment level, which requires the use of pre-defined page layouts. To address these issues, several back end caching approaches have been proposed, including query result caching and fragment level caching. While back end approaches guarantee the correctness of results and offer the advantages of fine-grained caching, they neither address firewall delays nor reduce bandwidth requirements.In this paper, we present an approach and an implementation of a dynamic proxy caching technique which combines the benefits of both proxy-based and back end caching approaches, yet does not suffer from their above-mentioned limitations. Our dynamic proxy caching technique allows granular, proxy-based caching where both the content and layout can be dynamic. Our analysis of the performance of our approach indicates that it is capable of providing significant reductions in bandwidth. We have also deployed our proposed dynamic proxy caching technique at a major financial institution. The results of this implementation indicate that our technique is capable of providing order-of-magnitude reductions in bandwidth and response times in real-world dynamic Web applications. - PublicationDispersion of two-photon absorption coefficient for a CdS-ZnO nanocomposite thin film using antiresonant ring interferometer(2003-01-01)
;Vasa, P. ;Taneja, P. ;Ayyub, P.Singh, B. P.We discuss the dispersion of two-photon absorption coefficient for a CdS-ZnO nanocomposite thin film using a new, sensitive, simple and single beam technique called antiresonant ring interferometric spectroscopy. - PublicationPhotoluminescence properties of CdS-ZnO nanocomposite thin films(2003-01-01)
;Vasa, P. ;Taneja, P. ;Ayyub, P.Singh, B. P.We discuss preparation and photoluminescence properties of CdS-ZnO nanocomposite thin films. Composite material is found to have higher photoluminescence yield and spatial coherence compared to the nanoparticle thin films of CdS and ZnO. - PublicationAniPQO: Almost Non-intrusive Parametric Query Optimization for Nonlinear Cost Functions(2003-01-01)
;Hulgeri, ArvindSudarshan, S.This chapter proposes a heuristic solution for the parametric query optimization (PQO) problem for the case when the cost functions may be nonlinear in the given parameters. This solution is minimally intrusive in the sense that an existing query optimizer can be used with minor modifications. The chapter implements the heuristic and the results of the tests on the TPCD benchmark indicate that the heuristic is very effective. The minimal intrusiveness, generality in terms of cost functions and number of parameters and good performance indicates that the solution is of significant practical importance. The cost of a query plan depends on many parameters, such as predicate selectivities and available memory, whose values may not be known at optimization time. PQO optimizes a query into a number of candidate plans, each optimal for some region of the parameter space. - PublicationText Representation with WordNet Synsets using Soft Sense Disambiguation(2003-01-01)
;Ramakrishnanan, GaneshBhattacharyya, PushpakText information processing depends critically on the proper representation of texts. A common and naive way of representing a text is as a bag of its component words. This representation suffers primarily from two drawbacks, viz., polysemy and synonymy which arise because of the ambiguity of the words and the lack of information about the relations between the words. This paper presents a model for representing a text in terms of the synsets in the WordNet- the lexical knowledge base of English words along with the semantic relations. These synsets stand for concepts which correspond to the words of the text. In particular, a soft sense disambiguation approach has been proposed. The text representation so obtained is found to convey the key ideas that the texts deal with. WordNet relations with other words in the sentence are exploited to disambiguate the senses. This scheme has been evaluated using a goodness measure based the information content of the representation of the text. As an actual application, the problem of text classification has been taken up, and the results are encouraging. - PublicationAn Efficient and Resilient Approach to Filtering and Disseminating Streaming Data(2003-01-01)
;Shah, Shetal ;Dharmarajan, ShyamshankarRamamritham, KrithiThis chapter discusses an efficient and resilient approach to filtering and disseminating streaming data. It considers techniques for creating a resilient and efficient content distribution network for such dynamically changing streaming data. The chapter addresses the problem of maintaining the coherency of dynamic data items in a network of repositories: Data disseminated to one repository is filtered by that repository and disseminated to repositories dependent on it. The method is resilient to link failures and repository failures. This resiliency implies that data fidelity is not lost even when the repository from which a user obtains data experiences failures. Experimental evaluation, using real world traces of streaming data, demonstrates that (1) the cost of adding this redundancy is low, and (2) surprisingly, in many cases, adding resiliency-enhancing features actually improves the fidelity provided by the system even in cases when there are no failures. To enhance fidelity, the chapter proposes efficient techniques for filtering data arriving at one repository and for scheduling the dissemination of filtered data to another repository. The results show that the combination of resiliency enhancing and efficiency improving techniques in fact help derive the potential that push based systems are said to have in delivering 100% fidelity. Without them, computational and communication delays inherent in dissemination networks can lead to a large fidelity loss even in push based dissemination. - PublicationMessage Scheduling on a Wormhole-Switched Linear Client-Server Network(2006-01-01)
;Yang, Bing ;Gumaste, Ashwin ;Lu, EnyueZheng, S. Q.The advantage of wormhole switching in interconnection networks is its distance insensitivity of communication latency under light traffic. However, this property vanishes when traffic is heavy. We consider the performance of a linear wormhole-switched network used as a real-time client-server network. Messages generated by client hosts are periodically transmitted to a central server within a predictable delivery time. We present two algorithms for generating feasible message transmission schedules and compare their performances. It is shown that trade-off exists between quality of schedules and the network utilization. Several open problems are posed. - PublicationTHE WEBER QUANTIZER: PERCEPTUAL CODING FOR NETWORKED TELEPRESENCE AND TELEACTION(2007-01-01)
;Hinterseer, P. ;Kammerl, J. ;Steinbach, E.Chaudhuri, S.We present a theoretical analysis of a perceptual coding approach for networked telepresence and teleaction. Our so called Weber quantizer is based on Weber’s Law and can be used in haptic data communication to eliminate changes which can not be perceived by the human operator. The main advantage of the Weber quantizer is that it minimizes the number of samples or packets to be transmitted. Basic properties like the resulting sample rate and the MSE of the proposed Weber quantizer are derived analytically and proven correct by simulation for the case of a uniformly distributed input sequence. The contributions in this paper provide the basis for the analysis of more realistic signal models and constitute a first step towards the understanding of the relationship between the Weber quantizer, statistical error measures and actual human distortion perception. - PublicationInfluence of Intra Cavity Attenuation on the Linewidth of an Erbium Doped Fiber Ring Laser(2008-07-01)
;Venkitesh, DeepaRamarao, Vijayatunable erbium doped fiber ring laser is constructed by controlling the intra cavity loss without the use of wavelength selective filters. For a given pump power, the linewidth of such a filterless laser is a function of the intra cavity loss. The linewidth characteristics in the C and L bands are independently observed and analysed. The results of numerical simulations of fiber ring laser are used to analyse the linewidth characteristics observed in the C band. The characteristics in the L band are interpreted qualitatively through the spectral dependence of the threshold power. The linewidth characteristics of a filter-less modelocked fiber laser are also addressed. - PublicationAn Empirical Relationship for Determining Shear Wave Velocity in Granular Materials Accounting for Grain Morphology(2008-09-19)
;Patel, A. ;Bartake, P. P.Singh, D. N.It is well recognized that shear wave velocity in the soil mainly depends on its type, state of compaction, and the confining stress. However, the influence of other geometrical parameters of soil particles (such as size, shape, roundness, and sphericity) on the shear wave velocity has not received much attention of the earlier researchers. It must be noted that these parameters primarily control the delegation of the applied normal stress in the soil and hence would directly influence the magnitude of shear wave velocity in it. With this in view, investigations were conducted to determine shear wave velocity in different types of granular materials (sands, glass beads, and cenospheres) by employing bender elements. Details of the methodology adopted for this purpose are presented in this paper. Experimental data have been used to develop an empirical correlation predicting shear wave velocity in granular materials by inputting their mean grain size and shape, and the confining stress, under dry and saturated states. The utility and efficiency of these relationships has also been demonstrated. - PublicationDevelopment of a Motor-Based Differential Settlement Simulator Setup for a Geotechnical Centrifuge(2010-08-20)
;Rajesh, S.Viswanadham, B. V.S.This paper presents the design details of a motor-based differential settlement simulator (MDSS) setup for inducing differential settlement in a high gravity environment. The MDSS setup comprises of a motor, controller, screw jack, central platform, gear trains, connecting shaft, and bearings. Various features of the MDSS setup along with its design procedure are described. Calibration and performance tests were performed in a large beam geotechnical centrifuge facility available at Indian Institute of Technology Bombay for inducing differential settlement using the MDSS setup. A design chart was developed for obtaining prototype settlement rates from the speed of the motor and gravity level. The usage of this setup in assessing the deformation behavior of the clay barrier of landfill cover system under various settlement rates is briefly illustrated. The versatility of the developed setup for the applications involving ground loss or boundary displacement problems is highlighted. - PublicationEffective explosive energy utilization for engineering blasting - initial results of an inventive stemming plug, SPARSH(2011-01-01)
;Sazid, Md ;Saharan, M. R.Singh, T. N.