Chat Settings
Configure chat behavior and response parameters.
Chat Parameters
Temperature
Controls response randomness (0-2):
- 0: Deterministic, focused responses
- 1: Balanced (default)
- 2: Creative, varied responses
Lower values produce more consistent outputs, while higher values introduce more variety and creativity.
Max Tokens
Maximum length of AI responses (1-8192).
Higher values allow longer responses but may increase cost and latency.
Response Settings
Enable Streaming Response
- Enabled: Show responses as they generate (recommended)
- Disabled: Show complete response when done
Streaming provides better perceived performance and allows you to see the AI's thinking process.
Show Token Usage
Display token count for each message:
- Input tokens
- Output tokens
- Cached tokens (if applicable)
Useful for monitoring API costs.
Enable Markdown Rendering
Render markdown formatting in messages:
- Headers
- Lists
- Code blocks with syntax highlighting
- Links
- Tables
Single Dollar LaTeX
When markdown is enabled, you can also enable single dollar sign LaTeX rendering:
$x = y$renders as inline math$$equation$$renders as block math
TIP
Only enable this if you frequently work with mathematical content. It may interfere with regular text containing dollar signs.
Default Tool Selection
Configure how tools are enabled by default for new conversations:
Auto (Recommended)
AI automatically chooses appropriate tools based on the conversation context. This is the default and recommended setting.
All
All available tools are enabled for every conversation. Use this if you want maximum capability.
None
No tools are enabled by default. You'll need to manually enable tools for each conversation. Use this for simple chat-only interactions.
Sound Effects
Play synthesized sounds during AI response streaming.
Enable Sound Effects
Toggle streaming sound effects on/off.
Sound Preset
Choose a synth preset:
- Classic: Warm triangle wave with gentle reverb
- Ethereal: Soft sine wave with spacious atmosphere
- Digital: Crisp square wave with minimal effects
- Retro: Nostalgic sawtooth with warm filtering
Click Preview to hear each preset before selecting.
Volume
Adjust the volume of sound effects (0-100%).
Auto Compact
Automatically manage long conversations by compacting context when it approaches the model's limit.
Enable Auto Compact
When enabled, Alma automatically summarizes older parts of the conversation to stay within context limits while preserving important information.
Usage Threshold
Trigger compaction when context usage exceeds this percentage (5-95%).
- Lower values (e.g., 60%): Compact earlier, more aggressive memory management
- Higher values (e.g., 90%): Compact later, preserve more context
Keep Recent Messages
Number of recent messages to always keep unchanged (2-20).
These messages will never be compacted, ensuring the most recent context is always available.
TIP
Auto Compact is especially useful for very long conversations where you want to continue discussing without losing important context.
