Browsing by Type "mastersthesis"

Now showing 1 - 17 of 17

Some of the metrics are blocked by your
consent settings
Betrachtungen über ein distanzbasiertes Klassifikationsverfahren
(2012)
Schirneck, Martin
In dieser Arbeit wird ein neuartiges distanzbasiertes Klassifikationsverfahren zur Kategorisierung numerischer Datenvektoren bei binärer Klassenteilung entworfen. Die dazu verwendeten Metriken basieren auf dem Volumen endlicher Vereinigungen d-dimensionaler Kugeln im Euklidischen Raum. Die angegebene Berechnungsvorschrift behandelt dabei konkret den zweidimensionalen Fall. Es werden außerdem einige topologische Eigenschaften der neuen Methode aufgezeigt. Von besonderem Interesse ist hier das Verhalten der medialen Achse der auftretenden Distanzfunktionen. Die gewonnen Erkenntnisse legen nahe, dass das entwickelte Verfahren fähig ist, zwischen der k-Nächste-Nachbarn-Methode und einem linearen Klassifikator zu interpolieren.
Some of the metrics are blocked by your
consent settings
Enumeration Algorithms in Data Profiling
(2022)
Schirneck, Martin
Data profiling is the extraction of metadata from relational databases. An important class of metadata are multi-column dependencies. They come associated with two computational tasks. The detection problem is to decide whether a dependency of a given type and size holds in a database. The discovery problem instead asks to enumerate all valid dependencies of that type. We investigate the two problems for three types of dependencies: unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We first treat the parameterized complexity of the detection variants. We prove that the detection of UCCs and FDs, respectively, is W[2]-complete when parameterized by the size of the dependency. The detection of INDs is shown to be one of the first natural W[3]-complete problems. We further settle the enumeration complexity of the three discovery problems by presenting parsimonious equivalences with well-known enumeration problems. Namely, the discovery of UCCs is equivalent to the famous transversal hypergraph problem of enumerating the hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. Finally, the discovery of INDs is shown to be equivalent to enumerating the satisfying assignments of antimonotone, 3-normalized Boolean formulas. In the remainder of the thesis, we design and analyze discovery algorithms for unique column combinations. Since this is as hard as the general transversal hypergraph problem, it is an open question whether the UCCs of a database can be computed in output-polynomial time in the worst case. For the analysis, we therefore focus on instances that are structurally close to databases in practice, most notably, inputs that have small solutions. The equivalence between UCCs and hitting sets transfers the computational hardness, but also allows us to apply ideas from hypergraph theory to data profiling. We devise an discovery algorithm that runs in polynomial space on arbitrary inputs and achieves polynomial delay whenever the maximum size of any minimal UCC is bounded. Central to our approach is the extension problem for minimal hitting sets, that is, to decide for a set of vertices whether they are contained in any minimal solution. We prove that this is yet another problem that is complete for the complexity class W[3], when parameterized by the size of the set that is to be extended. We also give several conditional lower bounds under popular hardness conjectures such as the Strong Exponential Time Hypothesis (SETH). The lower bounds suggest that the running time of our algorithm for the extension problem is close to optimal. We further conduct an empirical analysis of our discovery algorithm on real-world databases to confirm that the hitting set perspective on data profiling has merits also in practice. We show that the resulting enumeration times undercut their theoretical worst-case bounds on practical data, and that the memory consumption of our method is much smaller than that of previous solutions. During the analysis we make two observations about the connection between databases and their corresponding hypergraphs. On the one hand, the hypergraph representations containing all relevant information are usually significantly smaller than the original inputs. On the other hand, obtaining those hypergraphs is the actual bottleneck of any practical application. The latter often takes much longer than enumerating the solutions, which is in stark contrast to the fact that the preprocessing is guaranteed to be polynomial while the enumeration may take exponential time. To make the first observation rigorous, we introduce a maximum-entropy model for non-uniform random hypergraphs and prove that their expected number of minimal hyperedges undergoes a phase transition with respect to the total number of edges. The result also explains why larger databases may have smaller hypergraphs. Motivated by the second observation, we present a new kind of UCC discovery algorithm called Hitting Set Enumeration with Partial Information and Validation (HPIValid). It utilizes the fast enumeration times in practice in order to speed up the computation of the corresponding hypergraph. This way, we sidestep the bottleneck while maintaining the advantages of the hitting set perspective. An exhaustive empirical evaluation shows that HPIValid outperforms the current state of the art in UCC discovery. It is capable of processing databases that were previously out of reach for data profiling.
Some of the metrics are blocked by your
consent settings
Extraction Of Citation Data From Websites Based On Visual Cues
(2016)
Repke, Tim
In this master’s thesis a system for extracting meta-information, specifically citation data, from webpages is proposed. Machine Learning models like Artificial Neural Networks and Random Forests are trained to classify elements on a given webpage based on visual cues. Visual properties of elements are analysed in detail in order to derive meaningful numerical features for classification. After applying sensible post-processing filters, the system is able to recall up to 80% of the desired data at a precision of up to 90%. Relying purely on visual cues however has it’s limitations for robust extraction of some of the citation data. Possible approaches to facilitate that are discussed at the end.
Some of the metrics are blocked by your
consent settings
Geographical Exploration of Key Performance Indicators for Elderly Care Planning
(2016)
Postel, Malek
Some of the metrics are blocked by your
consent settings
Geographical Exploration of Key Performance Indicators for Elderly Care Planning
(2016)
Postel, Malek
Some of the metrics are blocked by your
consent settings
Konzepte und Techniken zur kollaborativen Visualisierung virtueller 3D-Stadtmodelle
(2009)
Klimke, Jan
Some of the metrics are blocked by your
consent settings
Mechanisms for Network Creation
(2018)
Neubert, Stefan
We introduce and analyze the possibilities of a new model for network creation by autonomous selfish agents: Unlike in typical network formation games such as the well-known model of Fabrikant et al. [PODC'03], the final network is not directly shaped by the players of a game. Instead, we design mechanisms that take edge preferences of agents as input for a social choice function and return a network that respects those preferences. In addition, it caters for compliance with global restrictions on the network topology and tries to establish several properties, such as global efficiency, maximizing the individual utility of agents, and building stable networks, especially in comparison to the result of an anarchic network formation. The mechanism also aims to produce Nash equilibria and encourages agents to honestly declare their preferences instead of playing strategically. The mechanism approach is a true superset of both centralized network design and uncoordinated network creation games. To the best of our knowledge this is the first attempt to explore the realm inbetween those extremes.
Some of the metrics are blocked by your
consent settings
Mitigating Sovereign Data Exchange Challenges: A Conceptual Framework to Apply Privacy- and Trust-Enhancing Technologies
(2021)
Schmidt, Kaja
Some of the metrics are blocked by your
consent settings
New Conditions via Markov Chains: Approximating Partition Functions of Abstract Polymer Models without Cluster Expansion
(2020)
Pappik, Marcus
Abstract polymer models are combinatorial structures that consist of a set of weighted objects, called polymers, together with a reflexive and symmetric incompatibility relation that describes which polymers cannot occur together. In this thesis we present the first Markov chain approach for sampling from the Gibbs distribution of abstract polymer models. Known Markov chains for polymer models from vertex and edge spin systems can be seen as special cases of this polymer Markov chain. We introduce the concept of polymer cliques and propose a generalized polymer mixing condition as a way to guarantee rapid mixing of our chain. The form of this condition is similar to conditions from cluster expansion approaches, such as the Kotecký-Preiss condition and the Fernández-Procacci condition, but it is less restrictive. To obtain an efficient sampling scheme for the Gibbs distribution from our polymer Markov chain, we prove that it suffices to draw each step of the chain approximately according to its transition probabilities. As one way to approximate each transition of the chain, we suggest to truncate each polymer clique based on some notion of size. We introduce the clique truncation condition as a general tool to determine the maximum size of polymers that we have to consider for the steps of the chain. We prove that our sampling scheme can be used to construct an FPRAS for the partition function. By this, we obtain the first Markov chain Monte Carlo method that works for abstract polymer models in a similar regime as cluster expansion approaches and beyond, while avoiding their complicated analytical assumptions and arguments. Further, we illustrate how our approach can be applied to algorithmic problems like the hard-core model on bipartite \(\alpha\)-expander graphs and the perfect matching polynomial to obtain new trade-offs between runtime and weight bounds. We emphasize that similar results can be obtained for a variety of other applications.
Some of the metrics are blocked by your
consent settings
On Realistic Random Network Models for Algorithm Evaluation
(2024)
Fischbeck, Philipp
Networks are widely used across many fields to represent interactions, such as the web, social interactions, infrastructure, or biological connections. Algorithms and processes on these networks have many uses, for example to find the shortest path in a road network, or to understand the spreading of information through a social network. In order to better answer such questions, we should understand how algorithms and processes are affected by features of the networks. One useful tool for this can be found in random network models, serving as proxies for real-world networks. Their features can be controlled via model parameters. While these models are usually easier to understand and analyze than real-world networks, their usefulness depends on how realistic they are. Thus, we look at well-known network models and evaluate whether they are useful for predicting algorithm behavior on real-world networks - the external validity. We focus on many widely-used algorithms and processes, and establish that the models are good proxies for real-world networks. We observe that heterogeneity and locality are important features for explaining and predicting algorithm behavior. In order to use random network models as proxies for real-world networks or to investigate model fit, one should utilize the model configuration capabilities optimally. As the models can be configured via parameters that affect the network features, this entails choosing the best parameters for some target network features. However, the exact relation between the configuration and the resulting network features is not clear. We present a method for estimating the best-fitting parameter configuration for commonly used network models, given some target features of the resulting networks. Our iterative method based on stochastic approximation needs only few samples and works reliably across a range of parameter configurations. Furthermore, we take a closer look at the features and substructures of networks. In particular, based on methods from clustering and outlier detection, we consider the idea that the edge set actually comes from two edge sets with differing features. With this idea, we investigate the well-known bootstrap percolation process on this combination of networks, which had previously only been considered on single network models. We theoretically and empirically consider the process on two combined network models, with differing percolation thresholds for the two edge sets, and see an interesting emerging behavior of a slow and a rapid percolation phase. In addition, we consider ways to separate edge sets based on locality features. Assuming the edges are a combination of two network models, we show how the edges can be separated by a simple metric, even with imbalanced set sizes. Overall, we evaluate common random network models, improve methods to utilize them, and contribute towards the design and understanding of new, combined network models.
Some of the metrics are blocked by your
consent settings
On Restrictions in Computational Language Learning
(2015)
Schirneck, Martin
In 1990 Fulk proved that partially set-drivenness (rearrangement-independence) does not weaken the power of unrestricted computational language learning. The question arises whether this result still holds if paired with various learning restrictions. We investigate the influence of two main categories of such restrictions, namely content-based and delayable ones. An adaption of Fulk’s theorem is verified for content-based learning and some delayable restrictions regarding U-shaped learning. On the other hand, we give an example criterion of delayable learning — explanatory learning from text by a strongly monotone scientist — for which partially set-drivenness does reduce the learning power. Additionally, the interdependence of these restrictions with several other interaction operators and success criteria are explored.
Some of the metrics are blocked by your
consent settings
On the Effectiveness of Data Reduction for Covering Problems in Real-World Transit Networks
(2018)
Fischbeck, Philipp
Given a graph and a set of paths, we want to find the minimal set of vertices such that each path is covered by at least one chosen vertex. Although this problem is NP-hard, real-world instances can be solved almost completely by a set of simple reduction rules. We examine this behavior from a theoretical and empirical perspective. First, we show that the problem is easy to solve for forests and cycle graphs. However, the problem is NP-hard for a feedback vertex number of 2 and a treewidth of 3. This indicates that the explanation for the effectiveness does not lie in the graph representation of problem instances. Thus, we examine the Hitting Set problem that arises when ignoring the graph representation and interpreting a path as a mere set of vertices. Through this relation, we show that the problem remains NP-hard even for very strong restrictions. Hitting Set instances that have a representation as a path graph can be recognized as such in polynomial time. However, finding the graph representation with the fewest edges is NP-hard. Based on the analysis of publicly available transit datasets, we show that the real-world instances are clustered and have heterogeneous stations, with the number of lines per station distributed according to a power law. We describe a model to generate random problem instances with adjustable clustering and heterogeneity. We use this model to show that while the heterogeneity does positively influence the effectiveness of the reduction rules, the largest effect comes from the clustering. Lastly, we show a strong relation between the reduction rules for the Hitting Set problem and reduction rules for the Independent Set problem on the intersection graph of the family of sets. We prove that the size of any independent set is a lower bound on the size of the maximum hitting set and show that the two bounds are very close for real-world instances. We show that the reduction rules need to be effective for Independent Set in order for them to be effective for Hitting Set.
Some of the metrics are blocked by your
consent settings
Photorealistic Rendering of Day and Night Sky Phenomena in Interactive 3D Geovirtual Environments
(2012)
Limberger, Daniel
Some of the metrics are blocked by your
consent settings
Prediction of Health Research Data using In-Memory Database Technology
(2016)
Horschig, Friedrich
Some of the metrics are blocked by your
consent settings
Real-time Exploration of Healthcare Data using In-Memory Database Technology
(2016)
Rückert, Lars
Some of the metrics are blocked by your
consent settings
Real-time Exploration of Healthcare Data using In-Memory Database Technology
(2016)
Rueckert, Lars
Some of the metrics are blocked by your
consent settings
Segmentierung mit Gaborfiltern zur Induktion struktureller Klassifikatoren auf Bilddaten
(1997)
Herbrich, Ralf