close
close

first Drop

Com TW NOw News 2024

BenchmarkAggregator: Comprehensive LLM testing from GPQA Diamond to Chatbot Arena, with effortless expansion
news

BenchmarkAggregator: Comprehensive LLM testing from GPQA Diamond to Chatbot Arena, with effortless expansion

BenchmarkAggregator: Comprehensive LLM testing from GPQA Diamond to Chatbot Arena, with effortless expansion

BenchmarkAggregator is an open-source framework for comprehensive LLM evaluation on advanced benchmarks such as GPQA Diamond, MMLU Pro, and Chatbot Arena. It provides objective comparisons of all major language models, testing both depth and breadth of capabilities. The framework is easily extensible and powered by OpenRouter for seamless model integration.

submitted by /u/mrcenter1
(link) (reactions)