Scheffer, TobiasTobiasSchefferHerbrich, RalfRalfHerbrich2024-10-142024-10-141997https://www.herbrich.me/papers/scheffer97.pdfhttps://knowledge.hpi.de/handle/123456789/4403In order to rank the performance of machine learning algorithms, many researchs conduct experiments on benchmark datasets. Since most learning algorithms have domain-specific parameters, it is a popular custom to adapt these parameters to obtain a minimal error rate on the test set. The same rate is used to rank the algorithm which causes an optimistic bias. We quantify this bias, showing in particular that an algorithm with more parameters will probably be ranked higher than an equally good algorithm with fewer parameters. We demonstrate this result, showing the number of parameters and trials required in order to pretend to outperform C4.5 or FOIL, respectively, for various benchmark problems. We then describe how unbiased ranking experiments should be conducted.Unbiased Assesment of Learning Algorithmsinproceedings