🧪
DeepEval
Open-source LLM evaluation framework for testing AI applications
Data & Analytics
DeepEval is an open-source evaluation framework specifically designed for testing and benchmarking LLM applications. It provides 14+ evaluation metrics out of the box—including faithfulness, answer relevancy, hallucination, toxicity, and bias—that can be run as unit tests in CI/CD pipelines. DeepEval supports RAG evaluation, agent evaluation, fine-tuning evaluation, and integrates with pytest, making LLM testing as straightforward as traditional software testing.
Key Features
- ✓14+ eval metrics
- ✓RAG evaluation
- ✓CI/CD integration
- ✓pytest compatible
- ✓Hallucination detection
- ✓Agent evaluation
#llm-evaluation#open-source#testing#rag#hallucination
Quick Info
- Category
- Data & Analytics
- Pricing
- Freemium
More Data & Analytics Tools
Julius AI
Data & AnalyticsAnalyze spreadsheets and databases by asking plain-English questions
Obviously AI
Data & AnalyticsBuild machine learning models without code
Polymer
Data & AnalyticsTransform spreadsheets into searchable apps
Hex
Data & AnalyticsCollaborative data notebooks with AI