Search Research Outputs
Recent Additions
- PublicationHypertext Databases and Data Mining(1999-06-01)The volume of unstructured text and hypertext data far exceeds that of structured data. Text and hypertext are used for digital libraries, product catalogs, reviews, newsgroups, medical reports, customer service reports, and the like. Currently measured in billions of dollars, the worldwide internet activity is expected to reach a trillion dollars by 2002. Database researchers have kept some cautious distance from this action. The goal of this tutorial is to expose database researchers to text and hypertext information retrieval (IR) and mining systems, and to discuss emerging issues in the overlapping areas of databases, hypertext, and data mining.
- PublicationSnakes and Sandwiches: Optimal Clustering Strategies for a Data Warehouse(1999-06-01)Physical layout of data is a crucial determinant of performance in a data warehouse. The optimal clustering of data on disk, for minimizing expected I/O, depends on the query workload. In practice, we often have a reasonable sense of the likelihood of different classes of queries, e.g., 40% of the queries concern calls made from some specific telephone number in some month. In this paper, we address the problem of finding an optimal clustering of records of a fact table on disk, given an expected workload in the form of a probability distribution over query classes.Attributes in a data warehouse fact table typically have hierarchies defined on them (by means of auxiliary dimension tables). The product of the dimensional hierarchy levels forms a lattice and leads to a natural notion of query classes. Optimal clustering in this context is a combinatorially explosive problem with a huge search space (doubly exponential in number of hierarchy levels). We identify an important subclass of clustering strategies called lattice paths, and present a dynamic programming algorithm for finding the optimal lattice path clustering, in time linear in the lattice size. We additionally propose a technique called snaking, which when applied to a lattice path, always reduces its cost. For a representative class of star schemas, we show that for every workload, there is a snaked lattice path which is globally optimal. Further, we prove that the clustering obtained by applying snaking to the optimal lattice path is never much worse than the globally optimal snaked lattice path clustering. We complement our analyses and validate the practical utility of our techniques with experiments using TPC-D benchmark data.
- PublicationQuerying Network Directories(1999-06-01)Heirarchically structured directories have recently proliferated with the growth of the Internet, and are being used to store not only address books and contact information for people, but also personal profiles, network resource information, and network and service policies. These systems provide a means for managing scale and heterogeneity, while allowing for conceptual unity and autonomy across multiple directory servers in the network, in a way for superior to what conventional relational or object-oriented databases offer. Yet, in deployed systems today, much of the data is modeled in an ad hoc manner, and many of the more sophisticated "queries"involve navigational access.In this paper, we develop the core of a formal data model for network directories, and propose a sequence of efficiently computable query languages with increasing expressive power. The directory data model can naturally represent rich forms of heterogeneity exhibited in the real world. Answers to queries expressible in our query languages can exhibit the same kinds of heterogeneity. We present external memory algorithms for the evaluation of queries posed in our directory query languages, and prove the efficiency of each algorithm in terms of its I/O complexity. Our data model and query languages share the flexibility and utility of the recent proposals for semi-structured data models, while at the same time effectively addressing the specific needs of network directory applications, which we demonstrate by means of a representative real-life example.
- PublicationOptimization of Constrained Frequent Set Queries with 2-variable Constraints(1999-06-01)Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, introduced the notion of constrained frequent set queries (CFQs), and developed effective pruning optimizations for CFQs with 1-variable (1-var) constraints.While 1-var constraints are useful for constraining the antecedent and consequent separately, many natural examples of CFQs illustrate the need for constraining the antecedent and consequent jointly, for which 2-variable (2-var) constraints are indispensable. Developing pruning optimizations for CFQs with 2-var constraints is the subject of this paper. But this is a difficult problem because: (i) in 2-var constraints, both variables keep changing and, unlike 1-var constraints, there is no fixed target for pruning; (ii) as we show, "conventional"monotonicity-based optimization techniques do not apply effectively to 2-var constraints.The contributions are as follows. (1) We introduce a notion of quasi-succinctness, which allows a quasi-succinct 2-var constraint to be reduced to two succinct 1-var constraints for pruning. (2) We characterize the class of 2-var constraints that are quasi-succinct. (3) We develop heuristic techniques for non-quasi-succinct constraints. Experimental results show the effectiveness of all our techniques. (4) We propose a query optimizer for CFQs and show that for a large class of constraints, the computation strategy generated by the optimizer is ccc-optimal, i.e., minimizing the effort incurred w.r.t. constraint checking and support counting.
- PublicationTurbo-charging Vertical Mining of Large Databases(2000-01-01)In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular character istics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called "snakes" and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and hor izontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practi cally infeasible, horizontal mining algorithm.
Most viewed
- PublicationBerti: an Accurate Local-Delta Data Prefetcher(2022-01-01)Data prefetching is a technique that plays a crucial role in modern high-performance processors by hiding long latency memory accesses. Several state-of-the-art hardware prefetchers exploit the concept of deltas, defined as the difference between the cache line addresses of two demand accesses. Existing delta prefetchers, such as best offset prefetching (BOP) and multi-lookahead prefetching (MLOP), train and predict future accesses based on global deltas. We observed that the use of global deltas results in missed opportunities to anticipate memory accesses.In this paper, we propose Berti, a first-level data cache prefetcher that selects the best local deltas, i.e., those that consider only demand accesses issued by the same instruction. Thanks to a high-confidence mechanism that precisely detects the timely local deltas with high coverage, Berti generates accurate prefetch requests. Then, it orchestrates the prefetch requests to the memory hierarchy, using the selected deltas.Our empirical results using ChampSim and SPEC CPU2017 and GAP workloads show that, with a storage overhead of just 2.55 KB, Berti improves performance by 8.5% compared to a baseline IP-stride and 3.5% compared to IPCP, a state-of-the-art prefetcher. Our evaluation also shows that Berti reduces dynamic energy at the memory hierarchy by 33.6% compared to IPCP, thanks to its high prefetch accuracy.
- PublicationOn the Question of S-S Bond Cleavage of 2,2′-Dithiodipyridine on Selective Ru and Os Platforms. MLCT or Hydride or Solvent Mediated Event(2022-09-12)This article deals with the S-S bond scission of the model substrate 2,2′-dithiodipyridine (DTDP) in the presence of a selective set of metal precursors: RuII(acac)2, [RuIICl2(PPh3)3], [RuIIHCl(CO)(PPh3)3], [RuII(H)2(CO)(PPh3)3], [RuII(bpy)2Cl2], [RuII(pap)2Cl2], [OsII(bpy)2Cl2], and [OsII(pap)2Cl2] (acac, acetylacetonate; bpy, 2,2′-bipyridine; pap, 2-phenylazopyridine). This led to the eventual formation of the corresponding mononuclear complexes containing the cleaved pyridine-2-thiolate unit in 1-4/[5]ClO4-[8]ClO4. The formation of the complexes was ascertained by their single-crystal X-ray structures, which also established sterically constrained four-membered chelate (average N1-M-S1 angle of 67.89°) originated from the in situ-generated pyridine-2-thiolate unit. Ruthenium(III)-derived one-electron paramagnetic complexes 1-2 (S = 1/2, magnetic moment/B.M. = 1.82 (1)/1.81(2)) exhibited metal-based anisotropic electron paramagnetic resonance (EPR) (Δg: 1/2 = 0.64/0.93, ⟨g⟩: 1/2 = 2.173/2.189) and a broad 1H nuclear magnetic resonance (NMR) signature due to the contact shift effect. The spectroelectrochemical and electronic structural aspects of the complexes were analyzed experimentally in combination with theoretical calculations of density functional theory (DFT and TD-DFT). The unperturbed feature of DTDP even in refluxing ethanol over a period of 10 h can be attributed to the active participation of the metal fragments in facilitating S-S bond cleavage in 1-4/[5]ClO4-[8]ClO4. It also revealed the following three probable pathways toward S-S bond cleavage of DTDP as a function of metal precursors: (i) the metal-to-ligand charge-transfer (MLCT) (RuII→ σ∗ of DTDP)-driven metal oxidation (RuII→ RuIII) process in the case of relatively electron-rich metal fragments {RuII(acac)2} or RuIICl2in 1 or 2, respectively; (ii) metal hydride-assisted formation of 3 or 4 with the concomitant generation of H2; and (iii) S-S bond reduction with the simultaneous oxidation of the solvent benzyl alcohol to benzaldehyde.
- PublicationCrystal plasticity constitutive modeling of tensile, creep and cyclic deformation in single crystal Ni-based superalloys(2022-11-01)A microstructure-sensitive crystal plasticity constitutive framework is proposed for simulating the tensile, cyclic, and creep response of single crystal Ni-based superalloys. In this framework, a non-Schmid model is used to account for the orientation- and temperature-dependent yield anisotropy. A model for the evolution of the slip system-level backstress is developed to simulate the cyclic response. The model accounts for the effect of microstructural features like precipitate volume fraction and size, and matrix channel width, on the associated hardening mechanisms. A physical model for creep deformation has also been developed that accounts for dislocation climb normal to the slip plane and the microstructure evolution due to rafting and isotropic coarsening of the precipitates. Application of the model is demonstrated by predicting the orientation- and temperature-dependent thermo-mechanical deformation of two single crystal superalloys, CMSX-4 and PWA-1484. The model is calibrated to the experimental tensile, cyclic and creep response for these alloys. Finally, the model is qualitatively validated by predicting the creep-fatigue interactions for different loading orientations and a range of thermo-mechanical conditions for CMSX-4 and PWA-1484.
- PublicationBPNN (ANN) Based Operating Speed Models for Horizontal Curves Using Naturalistic Driving Data(2022-01-01)Safety is a major concern when dealing with the geometry of the road. To improve safety, designers started using operating speed prediction models in a geometric design. So far, the majority of the available works on speed models was focused on 85th percentile speeds, giving less importance for the rest of the percentile speeds models. However, the other percentile speeds such as 15th, 50th, 95th, and 98th do exhibit their influence on geometric parameters in geometric design. These percentile speeds needed further exploration. Thereby in this study, percentile speeds models are developed using Back Propagation Neural Network (BPNN) with naturalistic speed data collected on horizontal curves. The percentile speed (Vp ) model developed yields better results with R2 of 0.83. The developed model also showed that design speed has the most substantial influence on percentile speeds with the Relative Parameter Influence (RI) of 16%. The percentile speed results obtained from the BPNN model show normal distribution (K-S test). We can say that the developed model represents the naturalistic free-flow speed distribution.
- Publication