Verbex bangla mini
1. Model Overview
- Name: Verbex Bangla Mini
- Version: 1.0
- Authors: Verbex LLM Team
2. Short Description
Verbex Bangla Mini is a Bengali-specialized large language model, optimized for tool-calling and conversational tasks. It exhibits superior performance on Bengali benchmarks and tool-invocation metrics, outperforming GPT-4.1, GPT-4.1-mini, Sonnet-3.7, and Gemini-2.0-flash in both accuracy and efficiency.
3. Intended Use
Primary Use Cases:
- Conversational agents (chatbots) in Bengali
- Automated tool invocation (e.g., database queries, API calls)
- Can check the calendar and knowledge base, book a calendar, send an email, call an API, etc.
- Faster than GPT-4o, GPT-4.1 model
Not intended for:
- Highly sensitive domains (medical, legal) without human oversight
- Languages other than Bengali
4. Performance & Evaluation
Human Evaluation (on a scale of 1.0)
Evaluation Criteria:
- Out-of-Domain-Handling: Handling a query that is out of scope
- Red Teaming: Handling slang languages and inappropriate talk
- Language Performance: Proper conversation in Bengali
Model | OOD | Red Teaming | Language |
---|---|---|---|
gpt-4.1 | 0.9 | 1.0 | 0.7 |
gpt-4.1-mini | 0.8 | 1.0 | 0.7 |
Verbex Bangla Mini | 0.9 | 1.0 | 0.8 |
Automatic Evaluation (on a scale of 5.0)
Evaluation Criteria:
- Conversation Flow: Correctly calls all the tools one by one in the correct order
- Tool Call Accuracy: Correctly calls all the tools, independent of order
- OOD Detection Accuracy: Correctly flags queries outside the model’s knowledge or scope
- Appropriateness: Produces safe, clear, grammatically sound, and context-relevant responses
- Graceful Refusal/Redirection: Politely declines or reroutes requests that it can’t or shouldn’t handle
- Consistency: Maintains coherent logic and avoids contradictions across related turns
- Alignment with Reference: Matches or improves on a given “ideal” answer without drifting from intent
- Tool-Call Justification: Invokes external tools correctly and explains their outputs clearly
Model | Flow | Tool Calling | OOD | Appropriateness | Redirect | Consistency | Alignment | Tool Justify |
---|---|---|---|---|---|---|---|---|
gemini-2.0-flash | 0.26 | 0.48 | 1.62 | 2.26 | 1.33 | 2.42 | 1.44 | 1.56 |
nemotron | 0.05 | 0.70 | 0.60 | 1.33 | 0.81 | 1.65 | 0.86 | 1.19 |
gpt-4o | 0.35 | 0.82 | 1.50 | 3.29 | 1.10 | 3.39 | 2.57 | 3.85 |
gpt-4o-mini | 0.37 | 0.76 | 1.56 | 3.21 | 1.67 | 3.30 | 2.53 | 3.39 |
gpt-4.1 | 0.30 | 0.84 | 2.00 | 3.35 | 1.62 | 3.79 | 2.67 | 3.49 |
gpt-4.1-mini | 0.35 | 0.83 | 1.00 | 3.02 | 1.18 | 3.16 | 2.51 | 3.45 |
sonnet-3.7 | 0.23 | 0.83 | 2.12 | 3.00 | 1.70 | 3.40 | 2.49 | 3.85 |
Verbex Bangla Mini | 0.33 | 0.82 | 0.70 | 3.14 | 1.33 | 3.35 | 2.37 | 3.86 |
Evaluation conducted on held-out Bengali test sets of 42 complete conversations.
5. Limitations & Risks
- May hallucinate factual content when the tool fails
- Sometimes can answer without calling tools on well-known topics
6. Usage Details
Simply plug the model named Verbex Bangla Mini 1.0
into the platform to use it.
- The model is specifically optimized for Bengali language tasks and tool-calling scenarios, making it ideal for building conversational agents that need to interact with external systems.
- Writing prompt always use
Generate prompt
button for better performance
Prompt Template Guidelines
This model has been fine-tuned using a specific prompt template designed to optimize its performance in Bangla voice-based appointment booking scenarios. While you may use alternative prompt styles, adhering to the recommended template is strongly encouraged for achieving the best results in terms of contextual understanding, task execution, and natural conversation flow.
Below is the reference prompt template. You may compose your agent’s prompt accordingly or generate it automatically using our Generate Prompt
feature.