This document provides complete reference for all CLI commands, scripts, and Python APIs.
Downloads models from HuggingFace Hub.
python download_model.py [OPTIONS]
| Option | Description | Example |
|---|---|---|
--model KEY |
Model key or custom model ID | --model tinyllama |
tinyllama - TinyLlama/TinyLlama-1.1B-Chat-v1.0smollm2 - HuggingFaceTB/SmolLM2-135Mqwen - Qwen/Qwen2.5-0.5B-Instructnvidia_nemotron - nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUFmeta/Llama-3-8B)| Code | Description |
|---|---|
| 0 | Success |
| 1 | Error (invalid model, download failed, etc.) |
The main chat CLI.
python ai_cli.py [OPTIONS]
| Option | Description | Example |
|---|---|---|
--model MODEL |
Model folder name, path, or ‘auto’ | --model tinyllama |
# Auto-select model based on query
python ai_cli.py --model auto
# Use specific model
python ai_cli.py --model TinyLlama-1.1B-Chat-v1.0
# Use local path
python ai_cli.py --model ./models/my-model/
Bootstrap script for automated setup.
./tinyllama.sh [OPTIONS]
| Option | Description |
|---|---|
--bootstrap-only |
Download deps and model, don’t start CLI |
--model MODEL |
Auto-download specific model |
# Full bootstrap + launch
./tinyllama.sh
# Download only
./tinyllama.sh --bootstrap-only
# Download specific model
./tinyllama.sh --model nvidia_nemotron
Main chat interface class.
from ai_cli import TinyLlamaCLI
from pathlib import Path
# Initialize
cli = TinyLlamaCLI(
model_dir=Path("models/TinyLlama-1.1B-Chat-v1.0"),
model_label="TinyLlama"
)
# Run the CLI
cli.run()
| Parameter | Type | Description |
|---|---|---|
model_dir |
Path | Path to model directory |
model_label |
str | Display label for the model |
Starts the interactive chat loop.
cli.run()
Generates a response to user input.
response = cli._generate_response("Hello!")
Saves the current chat transcript.
cli._save_transcript()
Exports training data in JSONL format.
cli._export_training_data()
Configuration for text generation.
from ai_cli import GenerationConfig
cfg = GenerationConfig(
temperature=0.65,
top_p=0.9,
top_k=40,
repetition_penalty=1.1,
max_new_tokens=256,
do_sample=True
)
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature |
float | 0.65 | Sampling temperature (0=deterministic) |
top_p |
float | 0.9 | Nucleus sampling threshold |
top_k |
int | 40 | Top-k sampling |
repetition_penalty |
float | 1.1 | Repetition penalty |
max_new_tokens |
int | 256 | Maximum tokens to generate |
do_sample |
bool | True | Use sampling vs greedy |
Automatic tuning for generation settings.
from ai_cli import TinyLlamaOptimizer
cfg = TinyLlamaOptimizer.tune(
user_input="Explain Python decorators",
turns=1
)
Automatically tunes settings based on input.
cfg = TinyLlamaOptimizer.tune("Write a poem", 0)
Dictionary of pre-configured models.
from download_model import MODEL_CHOICES
print(MODEL_CHOICES)
# {'tinyllama': 'TinyLlama/TinyLlama-1.1B-Chat-v1.0', ...}
Get the local directory for a model.
from download_model import model_dir_for
path = model_dir_for("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
# Returns: Path('models/TinyLlama-1.1B-Chat-v1.0')
Parse command line arguments.
from ai_cli import parse_args
args = parse_args()
# args.model contains the --model value
from ai_cli import DEFAULT_MODEL_ID
# "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
from ai_cli import DEFAULT_MODEL_DIR
# Path("models/TinyLlama-1.1B-Chat-v1.0")
from ai_cli import SYSTEM_PROMPT
# "You are a helpful, concise AI assistant..."
Search the web for information.
from web_search import search_web
results = search_web("Python decorators tutorial")
for result in results:
print(result.title, result.url)
List of WebResult objects:
@dataclass
class WebResult:
title: str
url: str
snippet: str
Determine if a query needs web search.
from web_search import should_search_web
if should_search_web("latest AI news"):
# Search the web
pass
@dataclass
class ChatMessage:
role: str # "system", "user", "assistant"
content: str
timestamp: str # ISO 8601
@dataclass
class Transcript:
model: str
model_path: str
started_at: str
settings: dict
messages: list[ChatMessage]
@dataclass
class TrainingDataRecord:
id: str
source_transcript: str
created_at: str
messages: list[dict] # [{"role": "...", "content": "..."}]