Skip to main content
Have a request for an upcoming news/science story? Submit a Request

[RCAC Workshop] Evaluating LLMs: Benchmarks & Metrics

📅 Date: November 7th 2025 ⏰ Time: 10AM-11AM 💻 Location: Virtual 🏫 Instructor: Erfan Fakhabi

Who Should Attend

This session is for practitioners, researchers, and students who are working with large language models and want to better understand how to measure their strengths and weaknesses. It’s relevant whether you’re fine-tuning models, deploying them in applications, or simply interested in how the field defines “good performance.”

What You’ll Learn

We’ll explore evaluation across multiple dimensions, like reasoning, language understanding, safety, and efficiency. You’ll see how benchmarks like MMLU, HumanEval, and HELM are used, what quantitative metrics (e.g., perplexity, latency, throughput) tell us, and why human evaluation still matters. The goal is to provide a structured overview of the evaluation landscape so you can think critically about LLM performance in your own context.

Level

Intermediate. Assumes some familiarity with NLP or machine learning concepts, but no deep background in evaluation research is required.

🔗 Register now:LINK

Originally posted: