General Settings

Basic application settings and behavior.

Tool Model

The Tool Model is a dedicated model used for background AI operations, separate from your main chat model. This allows you to use a fast, cost-effective model for auxiliary tasks while using more capable (and expensive) models for conversation.

What Tool Model Does

Tool Model is used for:

Thread Title Generation: Automatically generating titles for new chat threads
Tool Selection: Analyzing which tools to use for a given task
Parameter Extraction: Parsing user requests into tool parameters
Memory Operations: Processing memory storage and retrieval (unless overridden in Memory Settings)
Background Tasks: Handling auxiliary AI operations

Why Use a Separate Tool Model

Benefit	Description
Cost Savings	Tool operations are frequent but simple - using a cheaper model reduces costs significantly
Speed	Smaller models respond faster, making tool operations feel instant
Efficiency	Your main model focuses on conversation while the tool model handles mechanics

Why Both Speed and Quality Matter

A good Tool Model must balance speed and generation quality—you can't sacrifice one for the other.

Why Speed Matters

Tool operations happen constantly during your conversation:

Every new chat needs a title generated
Every message may trigger tool selection analysis
Every tool call requires parameter extraction

If the Tool Model is slow (3+ seconds per operation), these micro-delays accumulate and make the entire experience feel sluggish. Users expect title generation and tool operations to feel instant—ideally completing in under 1 second.

Why Quality Still Matters

Despite needing speed, the Tool Model's tasks require genuine intelligence:

Task	What Can Go Wrong with Low Quality
Thread Titles	Generic titles like "Chat about code" instead of meaningful summaries like "Debugging React useEffect infinite loop"
Tool Selection	Calling the wrong tool or missing when a tool should be used entirely
Parameter Extraction	Misinterpreting "search for files modified last week" as searching for the literal text "last week"
Memory Operations	Storing irrelevant information or failing to retrieve relevant context

A model that's fast but inaccurate will cause frustrating errors throughout your workflow. A model that's accurate but slow will make the app feel unresponsive.

The Sweet Spot

The recommended models (like gpt-4o-mini, claude-haiku-4-5, gemini-2.0-flash) hit the sweet spot: they're small enough to respond in under 1 second, but capable enough to handle tool operations accurately.

Choosing a Tool Model

Recommended models:

Provider	Recommended	Why
OpenAI	`gpt-4o-mini`	Fast, excellent tool support
Anthropic	`claude-haiku-4-5`	Very fast, good quality
Google	`gemini-2.0-flash`, `gemini-1.5-flash`	Extremely fast
DeepSeek	`deepseek-chat`	Fast and cost-effective

Avoid Reasoning Models

Never use reasoning/thinking models as Tool Model:

OpenAI o1, o3 series
Anthropic models with extended thinking enabled
Any model designed for deep reasoning

These models are optimized for complex problem-solving and take significantly longer to respond (10-60+ seconds), making them completely unsuitable for tool operations that need instant responses.

Auto-Detection

If you don't configure a Tool Model, Alma automatically selects one based on your enabled providers:

Checks enabled providers in priority order (OpenAI → Anthropic → Google → others)
Selects the fastest recommended model available
Falls back to your chat model if no suitable tool model is found

Testing Your Tool Model

Use the Test button next to the model selector to verify performance:

Result	Response Time	Meaning
✅ Good	< 2.5s	Optimal for tool operations
⚠️ Slow	2.5s - 5s	Usable but may feel sluggish
❌ Unusable	> 5s	Too slow for responsive tool use

Language

Choose the interface language:

English (en)
Chinese (zh)
Japanese (ja)

Changes take effect immediately.

Startup

Auto-Start

Launch Alma automatically when you log in to your computer.

Start Minimized

Start in the system tray instead of opening a window.

Window Behavior

Minimize to Tray

When minimizing the window, minimize to system tray instead of taskbar.

Close to Tray

When clicking the close button, minimize to tray instead of quitting the application.

TIP

These options are useful for keeping Alma running in the background for quick access via the Quick Chat shortcut.

Quick Chat

Hide on Blur

When the Quick Chat window loses focus, automatically hide it.

When enabled, the Quick Chat window will close when you click outside of it. When disabled, it stays visible until you explicitly close it.

General Settings ​

Tool Model ​

What Tool Model Does ​

Why Use a Separate Tool Model ​

Why Both Speed and Quality Matter ​

Why Speed Matters ​

Why Quality Still Matters ​

Choosing a Tool Model ​

Auto-Detection ​

Testing Your Tool Model ​

Language ​

Startup ​

Auto-Start ​

Start Minimized ​

Window Behavior ​

Minimize to Tray ​

Close to Tray ​

Quick Chat ​

Hide on Blur ​