General Settings
Basic application settings and behavior.
Tool Model
The Tool Model is a dedicated model used for background AI operations, separate from your main chat model. This allows you to use a fast, cost-effective model for auxiliary tasks while using more capable (and expensive) models for conversation.
What Tool Model Does
Tool Model is used for:
- Thread Title Generation: Automatically generating titles for new chat threads
- Tool Selection: Analyzing which tools to use for a given task
- Parameter Extraction: Parsing user requests into tool parameters
- Memory Operations: Processing memory storage and retrieval (unless overridden in Memory Settings)
- Background Tasks: Handling auxiliary AI operations
Why Use a Separate Tool Model
| Benefit | Description |
|---|---|
| Cost Savings | Tool operations are frequent but simple - using a cheaper model reduces costs significantly |
| Speed | Smaller models respond faster, making tool operations feel instant |
| Efficiency | Your main model focuses on conversation while the tool model handles mechanics |
Why Both Speed and Quality Matter
A good Tool Model must balance speed and generation quality—you can't sacrifice one for the other.
Why Speed Matters
Tool operations happen constantly during your conversation:
- Every new chat needs a title generated
- Every message may trigger tool selection analysis
- Every tool call requires parameter extraction
If the Tool Model is slow (3+ seconds per operation), these micro-delays accumulate and make the entire experience feel sluggish. Users expect title generation and tool operations to feel instant—ideally completing in under 1 second.
Why Quality Still Matters
Despite needing speed, the Tool Model's tasks require genuine intelligence:
| Task | What Can Go Wrong with Low Quality |
|---|---|
| Thread Titles | Generic titles like "Chat about code" instead of meaningful summaries like "Debugging React useEffect infinite loop" |
| Tool Selection | Calling the wrong tool or missing when a tool should be used entirely |
| Parameter Extraction | Misinterpreting "search for files modified last week" as searching for the literal text "last week" |
| Memory Operations | Storing irrelevant information or failing to retrieve relevant context |
A model that's fast but inaccurate will cause frustrating errors throughout your workflow. A model that's accurate but slow will make the app feel unresponsive.
The Sweet Spot
The recommended models (like gpt-4o-mini, claude-haiku-4-5, gemini-2.0-flash) hit the sweet spot: they're small enough to respond in under 1 second, but capable enough to handle tool operations accurately.
Choosing a Tool Model
Recommended models:
| Provider | Recommended | Why |
|---|---|---|
| OpenAI | gpt-4o-mini | Fast, excellent tool support |
| Anthropic | claude-haiku-4-5 | Very fast, good quality |
gemini-2.0-flash, gemini-1.5-flash | Extremely fast | |
| DeepSeek | deepseek-chat | Fast and cost-effective |
Avoid Reasoning Models
Never use reasoning/thinking models as Tool Model:
- OpenAI
o1,o3series - Anthropic models with extended thinking enabled
- Any model designed for deep reasoning
These models are optimized for complex problem-solving and take significantly longer to respond (10-60+ seconds), making them completely unsuitable for tool operations that need instant responses.
Auto-Detection
If you don't configure a Tool Model, Alma automatically selects one based on your enabled providers:
- Checks enabled providers in priority order (OpenAI → Anthropic → Google → others)
- Selects the fastest recommended model available
- Falls back to your chat model if no suitable tool model is found
Testing Your Tool Model
Use the Test button next to the model selector to verify performance:
| Result | Response Time | Meaning |
|---|---|---|
| ✅ Good | < 2.5s | Optimal for tool operations |
| ⚠️ Slow | 2.5s - 5s | Usable but may feel sluggish |
| ❌ Unusable | > 5s | Too slow for responsive tool use |
Language
Choose the interface language:
- English (en)
- Chinese (zh)
- Japanese (ja)
Changes take effect immediately.
Startup
Auto-Start
Launch Alma automatically when you log in to your computer.
Start Minimized
Start in the system tray instead of opening a window.
Window Behavior
Minimize to Tray
When minimizing the window, minimize to system tray instead of taskbar.
Close to Tray
When clicking the close button, minimize to tray instead of quitting the application.
TIP
These options are useful for keeping Alma running in the background for quick access via the Quick Chat shortcut.
Quick Chat
Hide on Blur
When the Quick Chat window loses focus, automatically hide it.
When enabled, the Quick Chat window will close when you click outside of it. When disabled, it stays visible until you explicitly close it.
