Skip to content

Chat Settings

Configure chat behavior and response parameters.

Chat Parameters

Temperature

Controls response randomness (0-2):

  • 0: Deterministic, focused responses
  • 1: Balanced (default)
  • 2: Creative, varied responses

Lower values produce more consistent outputs, while higher values introduce more variety and creativity.

Max Tokens

Maximum length of AI responses (1-8192).

Higher values allow longer responses but may increase cost and latency.

Response Settings

Enable Streaming Response

  • Enabled: Show responses as they generate (recommended)
  • Disabled: Show complete response when done

Streaming provides better perceived performance and allows you to see the AI's thinking process.

Show Token Usage

Display token count for each message:

  • Input tokens
  • Output tokens
  • Cached tokens (if applicable)

Useful for monitoring API costs.

Enable Markdown Rendering

Render markdown formatting in messages:

  • Headers
  • Lists
  • Code blocks with syntax highlighting
  • Links
  • Tables

Single Dollar LaTeX

When markdown is enabled, you can also enable single dollar sign LaTeX rendering:

  • $x = y$ renders as inline math
  • $$equation$$ renders as block math

TIP

Only enable this if you frequently work with mathematical content. It may interfere with regular text containing dollar signs.

Default Tool Selection

Configure how tools are enabled by default for new conversations:

AI automatically chooses appropriate tools based on the conversation context. This is the default and recommended setting.

All

All available tools are enabled for every conversation. Use this if you want maximum capability.

None

No tools are enabled by default. You'll need to manually enable tools for each conversation. Use this for simple chat-only interactions.

Sound Effects

Play synthesized sounds during AI response streaming.

Enable Sound Effects

Toggle streaming sound effects on/off.

Sound Preset

Choose a synth preset:

  • Classic: Warm triangle wave with gentle reverb
  • Ethereal: Soft sine wave with spacious atmosphere
  • Digital: Crisp square wave with minimal effects
  • Retro: Nostalgic sawtooth with warm filtering

Click Preview to hear each preset before selecting.

Volume

Adjust the volume of sound effects (0-100%).

Auto Compact

Automatically manage long conversations by compacting context when it approaches the model's limit.

Enable Auto Compact

When enabled, Alma automatically summarizes older parts of the conversation to stay within context limits while preserving important information.

Usage Threshold

Trigger compaction when context usage exceeds this percentage (5-95%).

  • Lower values (e.g., 60%): Compact earlier, more aggressive memory management
  • Higher values (e.g., 90%): Compact later, preserve more context

Keep Recent Messages

Number of recent messages to always keep unchanged (2-20).

These messages will never be compacted, ensuring the most recent context is always available.

TIP

Auto Compact is especially useful for very long conversations where you want to continue discussing without losing important context.