Benchmark Harmonization and Model Similarity Analysis
Harmonizing major AI evaluation benchmarks (LiveBench, HELM, LMSYS Arena) and developing model similarity maps at FORESEER Lab, University of Michigan.
Harmonizing major AI evaluation benchmarks (LiveBench, HELM, LMSYS Arena) and developing model similarity maps at FORESEER Lab, University of Michigan.
Developing novel techniques for calibrating large language model reasoning through sentence-level hidden-state interventions at MINE Lab, University of Notre Dame.
Developing comprehensive evaluation frameworks for web agents that assess both action sequences and value alignment at SaNDwich Lab (IBM–Notre Dame collaboration).
Improving machine learning verification benchmarks and addressing numerical reproducibility challenges with the alpha-beta-CROWN verification tool at UIUC.