Search Research Outputs
Recent Additions
- PublicationHypertext Databases and Data Mining(1999-06-01)The volume of unstructured text and hypertext data far exceeds that of structured data. Text and hypertext are used for digital libraries, product catalogs, reviews, newsgroups, medical reports, customer service reports, and the like. Currently measured in billions of dollars, the worldwide internet activity is expected to reach a trillion dollars by 2002. Database researchers have kept some cautious distance from this action. The goal of this tutorial is to expose database researchers to text and hypertext information retrieval (IR) and mining systems, and to discuss emerging issues in the overlapping areas of databases, hypertext, and data mining.
- PublicationSnakes and Sandwiches: Optimal Clustering Strategies for a Data Warehouse(1999-06-01)Physical layout of data is a crucial determinant of performance in a data warehouse. The optimal clustering of data on disk, for minimizing expected I/O, depends on the query workload. In practice, we often have a reasonable sense of the likelihood of different classes of queries, e.g., 40% of the queries concern calls made from some specific telephone number in some month. In this paper, we address the problem of finding an optimal clustering of records of a fact table on disk, given an expected workload in the form of a probability distribution over query classes.Attributes in a data warehouse fact table typically have hierarchies defined on them (by means of auxiliary dimension tables). The product of the dimensional hierarchy levels forms a lattice and leads to a natural notion of query classes. Optimal clustering in this context is a combinatorially explosive problem with a huge search space (doubly exponential in number of hierarchy levels). We identify an important subclass of clustering strategies called lattice paths, and present a dynamic programming algorithm for finding the optimal lattice path clustering, in time linear in the lattice size. We additionally propose a technique called snaking, which when applied to a lattice path, always reduces its cost. For a representative class of star schemas, we show that for every workload, there is a snaked lattice path which is globally optimal. Further, we prove that the clustering obtained by applying snaking to the optimal lattice path is never much worse than the globally optimal snaked lattice path clustering. We complement our analyses and validate the practical utility of our techniques with experiments using TPC-D benchmark data.
- PublicationQuerying Network Directories(1999-06-01)Heirarchically structured directories have recently proliferated with the growth of the Internet, and are being used to store not only address books and contact information for people, but also personal profiles, network resource information, and network and service policies. These systems provide a means for managing scale and heterogeneity, while allowing for conceptual unity and autonomy across multiple directory servers in the network, in a way for superior to what conventional relational or object-oriented databases offer. Yet, in deployed systems today, much of the data is modeled in an ad hoc manner, and many of the more sophisticated "queries"involve navigational access.In this paper, we develop the core of a formal data model for network directories, and propose a sequence of efficiently computable query languages with increasing expressive power. The directory data model can naturally represent rich forms of heterogeneity exhibited in the real world. Answers to queries expressible in our query languages can exhibit the same kinds of heterogeneity. We present external memory algorithms for the evaluation of queries posed in our directory query languages, and prove the efficiency of each algorithm in terms of its I/O complexity. Our data model and query languages share the flexibility and utility of the recent proposals for semi-structured data models, while at the same time effectively addressing the specific needs of network directory applications, which we demonstrate by means of a representative real-life example.
- PublicationOptimization of Constrained Frequent Set Queries with 2-variable Constraints(1999-06-01)Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, introduced the notion of constrained frequent set queries (CFQs), and developed effective pruning optimizations for CFQs with 1-variable (1-var) constraints.While 1-var constraints are useful for constraining the antecedent and consequent separately, many natural examples of CFQs illustrate the need for constraining the antecedent and consequent jointly, for which 2-variable (2-var) constraints are indispensable. Developing pruning optimizations for CFQs with 2-var constraints is the subject of this paper. But this is a difficult problem because: (i) in 2-var constraints, both variables keep changing and, unlike 1-var constraints, there is no fixed target for pruning; (ii) as we show, "conventional"monotonicity-based optimization techniques do not apply effectively to 2-var constraints.The contributions are as follows. (1) We introduce a notion of quasi-succinctness, which allows a quasi-succinct 2-var constraint to be reduced to two succinct 1-var constraints for pruning. (2) We characterize the class of 2-var constraints that are quasi-succinct. (3) We develop heuristic techniques for non-quasi-succinct constraints. Experimental results show the effectiveness of all our techniques. (4) We propose a query optimizer for CFQs and show that for a large class of constraints, the computation strategy generated by the optimizer is ccc-optimal, i.e., minimizing the effort incurred w.r.t. constraint checking and support counting.
- PublicationTurbo-charging Vertical Mining of Large Databases(2000-01-01)In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this representation show performance improvements over their classical horizontal counterparts, but are either efficient only for certain database sizes, or assume particular character istics of the database contents, or are applicable only to specific kinds of database schemas. We present here a new vertical mining algorithm called VIPER, which is general-purpose, making no special requirements of the underlying database. VIPER stores data in compressed bit-vectors called "snakes" and integrates a number of novel optimizations for efficient snake generation, intersection, counting and storage. We analyze the performance of VIPER for a range of synthetic database workloads. Our experimental results indicate significant performance gains, especially for large databases, over previously proposed vertical and hor izontal mining algorithms. In fact, there are even workload regions where VIPER outperforms an optimal, but practi cally infeasible, horizontal mining algorithm.
Most viewed
- PublicationAdditive manufacturing of tungsten, tungsten-based alloys, and tungsten matrix composites(2023-03-01)Tungsten (W) materials are gaining more and more attention due to the extended applications of metallic systems in the extreme environments. Given W’s unique characteristics like room-temperature brittleness, additive manufacturing (AM) techniques could give them a higher design flexibility and manufacturability. With the growing focus and thriving development of W-faced AM techniques, since the mechanical performance of additively manufactured W parts is still unsatisfactory, a critical review to further explore the possibilities of combining W and AM processes is urgently needed. In this review, we systematically explain the fundamentals of AM processes for W materials. Following the traditional classification, we further discuss the widely used AM processes including wire arc additive manufacturing (WAAM), electron beam melting (EBM), laser powder bed fusion (LPBF), laser direct energy deposition (laser DED), and other modified yet emergent AM techniques. Accordingly, since additively manufacturing W materials is processing parameter-sensitive, we illustrated the effects of various important processing parameters on the AM process control and final parts’ quality. With this detailed understanding, various categories of AM-compatible W materials (i.e., pure W, W alloys, and W composites) were presented, and their general mechanical performance, distinct role (particularly the role of different alloying elements and added secondary-phase particles in W), and application-oriented benefits have been summarized. After clarifying the current status, main challenges, and triumphant successes for additively manufacturing W materials, we further provide a concise prospect into the development of additively manufactured (AMed) W materials by integrating potential fabrication, measurement, alloy design, and application’s considerations. In summary, this critical review investigates the fundamental and practical problems crucially limiting the applications of AMed W materials, and the comprehensive discussion concentrates the history of the development and combination between AM techniques and W design. All the understanding is of great importance to achieving foreseeable successful future applications of AMed W materials.
- PublicationAll-sky search for continuous gravitational waves from isolated neutron stars using Advanced LIGO and Advanced Virgo O3 data(2022-11-15)We present results of an all-sky search for continuous gravitational waves which can be produced by spinning neutron stars with an asymmetry around their rotation axis, using data from the third observing run of the Advanced LIGO and Advanced Virgo detectors. Four different analysis methods are used to search in a gravitational-wave frequency band from 10 to 2048 Hz and a first frequency derivative from -10-8 to 10-9 Hz/s. No statistically significant periodic gravitational-wave signal is observed by any of the four searches. As a result, upper limits on the gravitational-wave strain amplitude h0 are calculated. The best upper limits are obtained in the frequency range of 100 to 200 Hz and they are ∼1.1×10-25 at 95% confidence level. The minimum upper limit of 1.10×10-25 is achieved at a frequency 111.5 Hz. We also place constraints on the rates and abundances of nearby planetary- and asteroid-mass primordial black holes that could give rise to continuous gravitational-wave signals.
- PublicationSearch for Subsolar-Mass Binaries in the First Half of Advanced LIGO's and Advanced Virgo's Third Observing Run(2022-08-05)We report on a search for compact binary coalescences where at least one binary component has a mass between 0.2 M and 1.0 M in Advanced LIGO and Advanced Virgo data collected between 1 April 2019 1500 UTC and 1 October 2019 1500 UTC. We extend our previous analyses in two main ways: we include data from the Virgo detector and we allow for more unequal mass systems, with mass ratio q≥0.1. We do not report any gravitational-wave candidates. The most significant trigger has a false alarm rate of 0.14 yr-1. This implies an upper limit on the merger rate of subsolar binaries in the range [220-24200] Gpc-3 yr-1, depending on the chirp mass of the binary. We use this upper limit to derive astrophysical constraints on two phenomenological models that could produce subsolar-mass compact objects. One is an isotropic distribution of equal-mass primordial black holes. Using this model, we find that the fraction of dark matter in primordial black holes in the mass range 0.2 M
- PublicationIntegrated experimental and simulation approach to establish the effect of elemental segregation in Inconel 718 welds(2022-12-01)Inconel 718 as-welded structures are widely used in many critical applications. Welding will cause different types of detrimental phase formation and segregation of elements in the inter-dendritic region. The segregation of elements depends on the local curvatures during morphological evolution dictated by the thermal conditions prevailing during the process. In this study, we establish the segregation behavior of Inconel 718 as a function of solidification conditions. Microstructure evolution was correlated with Scheil's solidification predictions for different cooling rates and temperature gradients. The solidification pathway of segregated composition depends on the process parameters via the microstructure. Simulation results are validated with the experimental results, and the microstructure comparison shows good agreement. An integrated workflow is proposed to accelerate the welding process optimization and produce desired microstructure for multi-component alloys.
- PublicationOptimization of regeneration temperature for energy integrated water allocation networks(2022-06-01)The conservation of energy and water resources is vital to tackle climate change and ensure sustainability. Energy integrated water allocation networks simultaneously conserve these resources using heat exchangers and regeneration units. A hybrid solution strategy is proposed in this paper to achieve the minimum total annualized cost of these networks and offer market competitiveness. The proposed algorithm solves the mixed-integer non-linear programming problem formulated in this study through a heuristic-based non-linear programming technique. The ideology of Pinch Analysis is implemented to identify heuristics that explore the non-isothermal mixing potential of streams and optimize the regeneration temperatures. These heuristics can drastically reduce energy requirements, while the mathematical optimization of water reuse, recycling, and regeneration decreases water consumption. Capturing the trade-offs between energy consumption and heat exchangers ensures a lower total annualized cost. The potency of the algorithm developed in this work is exhibited using demonstrative examples from the literature. The variation in the operating cost, investment expense, and thereby, the total annualized costs with the temperature of regeneration units and the regeneration throughput are demonstrated in these examples. Optimal regeneration temperature can significantly reduce the total annualized costs of the overall system, as observed through these examples. In the single contaminant water network example, the total annualized cost is 45.3% lower compared to the literature.