Fischbeck, PhilippPhilippFischbeck2024-10-142024-10-142018https://knowledge.hpi.de/handle/123456789/2450Given a graph and a set of paths, we want to find the minimal set of vertices such that each path is covered by at least one chosen vertex. Although this problem is NP-hard, real-world instances can be solved almost completely by a set of simple reduction rules. We examine this behavior from a theoretical and empirical perspective. First, we show that the problem is easy to solve for forests and cycle graphs. However, the problem is NP-hard for a feedback vertex number of 2 and a treewidth of 3. This indicates that the explanation for the effectiveness does not lie in the graph representation of problem instances. Thus, we examine the Hitting Set problem that arises when ignoring the graph representation and interpreting a path as a mere set of vertices. Through this relation, we show that the problem remains NP-hard even for very strong restrictions. Hitting Set instances that have a representation as a path graph can be recognized as such in polynomial time. However, finding the graph representation with the fewest edges is NP-hard. Based on the analysis of publicly available transit datasets, we show that the real-world instances are clustered and have heterogeneous stations, with the number of lines per station distributed according to a power law. We describe a model to generate random problem instances with adjustable clustering and heterogeneity. We use this model to show that while the heterogeneity does positively influence the effectiveness of the reduction rules, the largest effect comes from the clustering. Lastly, we show a strong relation between the reduction rules for the Hitting Set problem and reduction rules for the Independent Set problem on the intersection graph of the family of sets. We prove that the size of any independent set is a lower bound on the size of the maximum hitting set and show that the two bounds are very close for real-world instances. We show that the reduction rules need to be effective for Independent Set in order for them to be effective for Hitting Set.On the Effectiveness of Data Reduction for Covering Problems in Real-World Transit Networksmastersthesis