tinyllama-cli

Installation Guide

This guide covers all the ways to install and set up TinyLlama GUI on your system.

Quick Start

The fastest way to get started is using the bootstrap script:

cd /Path/to/tinyllama-cli
./tinyllama.sh

The bootstrap script will automatically:

Check for Python and install if needed
Create a virtual environment (.venv)
Install dependencies from requirements.txt
Download a model (if none is installed)
Launch the GUI

Manual Setup

If you prefer to set up manually or need more control, follow these steps:

Prerequisites

Python 3.10 or later
pip (Python package manager)
git (optional, for cloning the repository)

Step 1: Clone the Repository

git clone https://github.com/your-repo/tinyllama-cli.git
cd tinyllama-cli

Step 2: Create a Virtual Environment

# Create a virtual environment
python3 -m venv .venv

# Activate it (Linux/macOS)
source .venv/bin/activate

# Activate it (Windows)
.venv\Scripts\activate.bat

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Set Environment Variables (Optional)

# Set HuggingFace token for gated models
export HF_TOKEN="your_huggingface_token_here"

# Or use the alternative variable name
export HUGGINGFACEHUB_API_TOKEN="your_huggingface_token_here"

You can also copy the example environment file:

cp .env.example .env
# Edit .env and add your token

Step 5: Download a Model

python download_model.py

This will prompt you to select a model. See Model Download for more options.

Step 6: Run the GUI

After building (or using pre-built), run:

./tinyllama.sh

Or directly:

cd tinyllama_gui
npm run start

Bootstrap Script

The tinyllama.sh script automates the entire setup process. Here’s what it does:

What It Does

Python Check: Verifies Python is installed, installs via Homebrew/apt if needed
Virtual Environment: Creates .venv in the project directory
Dependency Installation: Installs all packages from requirements.txt
Model Download: Downloads a model if none is installed
CLI Launch: Starts the chat interface

Usage

# Basic usage - walks through setup and launches CLI
./tinyllama.sh

# Bootstrap only (download model but don't start CLI)
./tinyllama.sh --bootstrap-only

# Auto-download specific model
./tinyllama.sh --model tinyllama

Options

Option	Description
`--bootstrap-only`	Download dependencies and model, but don’t start CLI
`--model MODEL`	Auto-download a specific model (tinyllama, smollm2, qwen, nvidia_nemotron)

Environment Variables

TinyLlama CLI uses the following environment variables:

Required for Gated Models

# HuggingFace token (required for some models like Llama)
export HF_TOKEN="your_token_here"

# Alternative variable name
export HUGGINGFACEHUB_API_TOKEN="your_token_here"

Optional Variables

# Custom model directory (default: ./models)
export MODEL_DIR="./custom_models"

# Custom transcripts directory (default: ./transcripts)
export TRANSCRIPTS_DIR="./custom_transcripts"

Setting Up Environment Variables

Linux/macOS (Bash/Zsh)

Add to your ~/.bashrc or ~/.zshrc:

export HF_TOKEN="your_huggingface_token_here"

Then reload:

source ~/.bashrc

Windows (PowerShell)

$env:HF_TOKEN = "your_huggingface_token_here"

To make it permanent, add to your PowerShell profile.

System Requirements

Minimum Requirements

OS: macOS, Linux, or Windows (WSL supported)
Python: 3.10+
RAM: 8GB (for smaller models like TinyLlama)
Storage: 2GB+ for model files

Recommended Requirements

OS: macOS 12+, Ubuntu 20.04+, or Windows 11 with WSL2
Python: 3.11+
RAM: 12GB+ (for larger models like Nemotron)
Storage: 10GB+ for multiple models
GPU: NVIDIA GPU with CUDA (optional, for faster inference)

GPU Support

For GPU-accelerated inference:

Install CUDA Toolkit (11.8+)
Install PyTorch with CUDA support:

pip install torch --index-url https://download.pytorch.org/whl/cu118

The CLI will automatically use CUDA if available.

Building C++ Extensions from Source

TinyLlama CLI includes optional C++ extensions for performance optimization. These extensions provide:

MathExtractor: Fast math expression extraction and evaluation
RamDetector: Cross-platform RAM detection
StringUtils: Optimized string operations (cleaning, trimming, etc.)
TokenCounter: Fast token estimation

Prerequisites

# Install pybind11
pip install pybind11

# For macOS, you may need Xcode command line tools
xcode-select --install

Building the Extension

# Navigate to the cpp_extensions directory
cd cpp_extensions

# Build the extension in-place
python setup.py build_ext --inplace

This will create tinyllama_cpp.cpython-XXX-darwin.so (or .so on Linux, .pyd on Windows).

Installing for Use

Copy the built extension to your virtual environment:

# For macOS
cp tinyllama_cpp.cpython-314-darwin.so .venv/lib/python3.14/site-packages/

# For Linux
cp tinyllama_cpp.cpython-310-x86_64-linux-gnu.so .venv/lib/python3.10/site-packages/

Verifying the Installation

.venv/bin/python -c "import tinyllama_cpp; print(tinyllama_cpp.VERSION)"

Troubleshooting

Error: unsupported option ‘-fopenmp’

On macOS with clang, OpenMP may not be available. Edit cpp_extensions/setup.py and comment out the -fopenmp flag:

# extra_compile_args = [
#     "-O3",
#     "-march=native",
#     "-ffast-math",
#     "-fopenmp",  # Comment this out on macOS
#     "-std=c++17",
# ]

Error: character too large for enclosing character literal

This is a known issue with Unicode characters in C++ on some compilers. The source code has been updated to use static_cast<char>() for Unicode characters like × (0xD7) and ÷ (0xF7).

Next Steps

Usage Guide - Learn how to use the chat CLI
Model Download - Download different models
Configuration - Customize your setup
Advanced Features - Learn about prompt tuning and export features

This site is open source. Improve this page.