This guide explains how to download and manage models for TinyLlama CLI.
TinyLlama CLI comes with several pre-configured models:
| Key | Model ID | Size | Description |
|---|---|---|---|
| tinyllama | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | ~1GB | Lightweight chat model, fast inference |
| smollm2 | HuggingFaceTB/SmolLM2-135M | ~270MB | Very small, resource-efficient |
| qwen | Qwen/Qwen2.5-0.5B-Instruct | ~1GB | Good quality, multilingual support |
| nvidia_nemotron | nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF | ~4GB | NVIDIA’s efficient nano model |
tinyllama - it’s fast and works well on most machinessmollm2 - runs smoothly on systems with 8GB+ RAMqwen or nvidia_nemotron - higher quality outputsRun the downloader without arguments to see the interactive picker:
python download_model.py
You’ll see:
╔════════════════════════════════════════════════════════════════╗
║ Model Downloader ║
╠════════════════════════════════════════════════════════════════╣
║ Available: tinyllama smollm2 qwen nvidia_nemotron choose more ║
╚════════════════════════════════════════════════════════════════╝
╔════════════════════════════════════════════════════════════════╗
║ Model Picker ║
╠════════════════════════════════════════════════════════════════╣
║ # Key Model ║
║ ───────────────────────────────────────────────────────────── ║
║ 1 tinyllama TinyLlama/TinyLlama-1.1B-Chat-v1.0 ║
║ 2 smollm2 HuggingFaceTB/SmolLM2-135M ║
║ 3 qwen Qwen/Qwen2.5-0.5B-Instruct ║
║ 4 nvidia_nemotron nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF ║
║ 5 choose more Type any HuggingFace model ID ║
╚════════════════════════════════════════════════════════════════╝
Select a number (1-5) to download that model.
Download a specific model directly:
# Using the short key
python download_model.py --model tinyllama
python download_model.py --model smollm2
python download_model.py --model qwen
python download_model.py --model nvidia_nemotron
Select option “5” (choose more) in the interactive picker, or use a custom model ID:
# Any model from HuggingFace Hub
python download_model.py --model meta/Llama-3-8B
python download_model.py --model mistralai/Mistral-7B-Instruct-v0.2
python download_model.py --modelEleutherAI/gpt-neo-2.7B
The model ID should be in the format organization/model-name:
| Organization | Example Model IDs |
|---|---|
| meta | Llama-3-8B, Llama-2-13B-chat |
| mistralai | Mistral-7B-Instruct-v0.2, Mixtral-8x7B |
| EleutherAI | gpt-neo-2.7B, gpt-j-6B |
| TinyLlama | TinyLlama-1.1B-Chat-v1.0 |
| Qwen | Qwen2.5-0.5B-Instruct, Qwen2.5-1.5B |
Some models require permission to download:
For these, you must:
Some models are “gated” and require:
# Linux/macOS
export HF_TOKEN="your_huggingface_token_here"
# Windows (PowerShell)
$env:HF_TOKEN = "your_huggingface_token_here"
Or use the alternative variable:
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token_here"
If no token is found in environment variables, the downloader will prompt you:
HuggingFace Token (optional)
Press Enter to skip (anonymous download may fail for gated models):
Enter your token or press Enter to skip.
# Copy example
cp .env.example .env
# Edit .env and add:
HF_TOKEN=your_token_here
| Option | Description | Example |
|---|---|---|
--model |
Model key or custom model ID | --model tinyllama |
--model |
Any HuggingFace model | --model meta/Llama-3-8B |
The downloader automatically resumes interrupted downloads:
# Just run again - it will pick up where left off
python download_model.py --model tinyllama
By default, models are stored in:
models/
├── TinyLlama-1.1B-Chat-v1.0/
│ ├── config.json
│ ├── model.safetensors
│ └── ...
├── NVIDIA-Nemotron-3-Nano-4B-GGUF/
│ └── ...
└── ...
You can change the location by modifying the code or using symbolic links.
Models are stored in the ./models directory:
ls -la models/
To remove a downloaded model:
# Delete a specific model folder
rm -rf models/TinyLlama-1.1B-Chat-v1.0
# Or use the tinyllama.sh helper (if implemented)
./tinyllama.sh --remove-model tinyllama
When you run ai_cli.py, it will show installed models:
# Installed Model Path
─────────────────────────────────────────
1 TinyLlama-1.1B-Chat-v1.0 ./models/TinyLlama-1.1B-Chat-v1.0
2 NVIDIA-Nemotron-3-Nano ./models/NVIDIA-Nemotron-3-Nano-4B-GGUF
Error: HfApiInvalidUsername:
Solution: Set your HuggingFace token (see HuggingFace Token section)
Error: Repository not found: https://huggingface.co/api/models/bad/model/id
Solution: Check the model ID is correct. Some models are renamed or moved.
Error: No space left on device
Solution: Free up space or download smaller models (smollm2, tinyllama)
The downloader uses resume_download=True to continue interrupted downloads. For slow connections, consider: