Model Name | ARC-Challenge | GSM8k | Hellaswag | MMLU | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
B | S | Y | B | S | Y | B | S | Y | S | Y |
Model Name | Benchmark | Type | Performance [%] | p-value [%] | \( \hat{\delta} \) [%] | \( \hat{\delta}_{0.95} \) [%] |
---|
@article{dekoninck2024constat,
title={ConStat: Performance-Based Contamination Detection in Large Language Models},
author={Jasper Dekoninck and Mark Niklas Müller and Martin Vechev},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.LG}
}