Research

Trajectory-Level Web Agent Evaluation

Developing comprehensive evaluation frameworks for web agents that assess both action sequences and value alignment at SaNDwich Lab (IBM–Notre Dame collaboration).