This guide explains how to use the TinyLlama GUI for chatting with local language models.
After installation, simply run:
./tinyllama.sh
This will start the interactive GUI.
The GUI provides buttons for all interactions:
╔════════════════════════════════════════════════════════════════╗
║ Model Picker ║
╠════════════════════════════════════════════════════════════════╣
║ Choose an installed model to launch ║
║ Each folder under ./models with a config.json is listed below. ║
║ Use [A] for Auto mode - selects based on your query. ║
║ ║
║ # Installed Model Path ║
║ ───────────────────────────────────────────────────────────── ║
║ A Auto Smart selection based on task ║
║ 1 TinyLlama-1.1B-Chat-v1.0 ./models/TinyLlama-1.1B... ║
║ 2 NVIDIA-Nemotron-3-Nano ./models/NVIDIA-Nemotron... ║
╚════════════════════════════════════════════════════════════════╝
Press A to enable Auto mode. The CLI will automatically select the best model based on your query:
# Select a specific model when starting
python ai_cli.py --model tinyllama
python ai_cli.py --model NVIDIA-Nemotron-3-Nano-4B-GGUF
python ai_cli.py --model auto # Smart selection
The GUI provides buttons for all interactions:
╔════════════════════════════════════════════════════════════════╗
║ TinyLlama CLI ║
║ Local Chat ║
╠════════════════════════════════════════════════════════════════╣
║ Try /help /settings /save /exit ║
║ ║
║ Model: TinyLlama-1.1B-Chat-v1.0 (GPU) ║
╚════════════════════════════════════════════════════════════════╝
Simply type your message and press Enter. The model will respond with generated text.
Example:
You: What is Python?
TinyLlama: Python is a high-level, interpreted programming language...
The CLI supports markdown rendering for model responses:
code renders in monospaceThe CLI provides several built-in commands:
Shows all available commands and their descriptions.
/help
Display and modify generation parameters:
/settings
Shows a panel with current settings:
| Setting | Value |
|---|---|
| temperature | 0.65 |
| top_p | 0.9 |
| top_k | 40 |
| repetition_penalty | 1.1 |
| max_new_tokens | 256 |
| do_sample | True |
Clears the chat history (starts a fresh conversation).
/clear
Saves the current transcript to transcripts/ and exports training data.
/save
Exits the CLI and saves the transcript automatically.
/exit
The CLI automatically tunes generation settings based on your prompts:
| Prompt Type | Temperature | Top-P | Max Tokens |
|---|---|---|---|
| Factual | 0.45 | 0.82 | 220 |
| Code | 0.40 | 0.85 | 280 |
| Creative | 0.88 | 0.95 | 320 |
| Math | 0.00 | 1.00 | 96 |
| Long Context | 0.55 | 0.86 | 192 |
You can modify settings by editing the code in ai_cli.py:
@dataclass
class GenerationConfig:
temperature: float = 0.65
top_p: float = 0.9
top_k: int = 40
repetition_penalty: float = 1.1
max_new_tokens: int = 256
do_sample: bool = True
| Key | Action |
|---|---|
| Enter | Send message |
| Ctrl+C | Interrupt generation |
| Ctrl+L | Clear screen |
| Ctrl+D | Exit (same as /exit) |
Chat transcripts are saved to:
transcripts/
├── 2024-01-15_143022.json
├── 2024-01-16_091545.json
└── ...
Each transcript contains:
Every time you use /save or /exit, the CLI exports training data:
training_data/tinyllama_sft.jsonl
Each line is a JSON object with:
id: Unique identifiersource_transcript: Transcript file namecreated_at: ISO timestampmessages: Chat messages arrayBe specific: “Explain how Python decorators work with examples” works better than “Explain decorators”
Break down complex requests: Split into multiple messages for better results
/settings to verify generation parameters