
Stanford HELM
The industry-standard framework for holistic, multi-metric evaluation of large language models.

The industry-standard framework for holistic, multi-metric evaluation of large language models.
Simulating the World's Intelligence to accelerate progress toward human-aligned AGI

The open-source data curation platform for LLMs and Generative AI alignment.

Open-source RAG evaluation tool for assessing accuracy, context quality, and latency of RAG systems.

Precision property valuations powered by the industry's most comprehensive real estate database.

Real estate valuation technology and analytics.

The enterprise-grade stack for evaluating, logging, and refining AI applications with 10x developer velocity.