This guide covers all the ways to install and set up TinyLlama GUI on your system.
The fastest way to get started is using the bootstrap script:
cd /Path/to/tinyllama-cli
./tinyllama.sh
The bootstrap script will automatically:
.venv)requirements.txtIf you prefer to set up manually or need more control, follow these steps:
git clone https://github.com/your-repo/tinyllama-cli.git
cd tinyllama-cli
# Create a virtual environment
python3 -m venv .venv
# Activate it (Linux/macOS)
source .venv/bin/activate
# Activate it (Windows)
.venv\Scripts\activate.bat
pip install -r requirements.txt
# Set HuggingFace token for gated models
export HF_TOKEN="your_huggingface_token_here"
# Or use the alternative variable name
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token_here"
You can also copy the example environment file:
cp .env.example .env
# Edit .env and add your token
python download_model.py
This will prompt you to select a model. See Model Download for more options.
After building (or using pre-built), run:
./tinyllama.sh
Or directly:
cd tinyllama_gui
npm run start
The tinyllama.sh script automates the entire setup process. Here’s what it does:
.venv in the project directoryrequirements.txt# Basic usage - walks through setup and launches CLI
./tinyllama.sh
# Bootstrap only (download model but don't start CLI)
./tinyllama.sh --bootstrap-only
# Auto-download specific model
./tinyllama.sh --model tinyllama
| Option | Description |
|---|---|
--bootstrap-only |
Download dependencies and model, but don’t start CLI |
--model MODEL |
Auto-download a specific model (tinyllama, smollm2, qwen, nvidia_nemotron) |
TinyLlama CLI uses the following environment variables:
# HuggingFace token (required for some models like Llama)
export HF_TOKEN="your_token_here"
# Alternative variable name
export HUGGINGFACEHUB_API_TOKEN="your_token_here"
# Custom model directory (default: ./models)
export MODEL_DIR="./custom_models"
# Custom transcripts directory (default: ./transcripts)
export TRANSCRIPTS_DIR="./custom_transcripts"
Add to your ~/.bashrc or ~/.zshrc:
export HF_TOKEN="your_huggingface_token_here"
Then reload:
source ~/.bashrc
$env:HF_TOKEN = "your_huggingface_token_here"
To make it permanent, add to your PowerShell profile.
For GPU-accelerated inference:
pip install torch --index-url https://download.pytorch.org/whl/cu118
The CLI will automatically use CUDA if available.
TinyLlama CLI includes optional C++ extensions for performance optimization. These extensions provide:
# Install pybind11
pip install pybind11
# For macOS, you may need Xcode command line tools
xcode-select --install
# Navigate to the cpp_extensions directory
cd cpp_extensions
# Build the extension in-place
python setup.py build_ext --inplace
This will create tinyllama_cpp.cpython-XXX-darwin.so (or .so on Linux, .pyd on Windows).
Copy the built extension to your virtual environment:
# For macOS
cp tinyllama_cpp.cpython-314-darwin.so .venv/lib/python3.14/site-packages/
# For Linux
cp tinyllama_cpp.cpython-310-x86_64-linux-gnu.so .venv/lib/python3.10/site-packages/
.venv/bin/python -c "import tinyllama_cpp; print(tinyllama_cpp.VERSION)"
Error: unsupported option ‘-fopenmp’
On macOS with clang, OpenMP may not be available. Edit cpp_extensions/setup.py and comment out the -fopenmp flag:
# extra_compile_args = [
# "-O3",
# "-march=native",
# "-ffast-math",
# "-fopenmp", # Comment this out on macOS
# "-std=c++17",
# ]
Error: character too large for enclosing character literal
This is a known issue with Unicode characters in C++ on some compilers. The source code has been updated to use static_cast<char>() for Unicode characters like × (0xD7) and ÷ (0xF7).