dolphin-ui

unified AI interface -- chat / image gen / voice synthesis

deployed ec2:8200 voice loop: planning

overview

dolphin-ui is a three-service web interface that unifies self-hosted AI capabilities into a single dark-themed dashboard. it connects a local LLM (Ollama on GPU), image generation (Stable Diffusion on T4), and text-to-speech (ElevenLabs API) behind a Flask backend on questboard-ec2.

a companion project, voice-dolphin, extends the concept into a real-time voice conversation loop running on Rocky with STT via Whisper.

architecture

                    +---------------------------+
                    |     browser (client)      |
                    |   tabs: chat|image|audio  |
                    +-------------+-------------+
                                  | HTTP
                    +-------------v-------------+
                    |   questboard-ec2:8200     |
                    |   Flask app (app.py)      |
                    |   nginx proxy :8201       |
                    |   100.83.251.119          |
                    +----+--------+--------+----+
                         |        |        |
              +----------v-+ +---v------+ +v----------------+
              |  picass0    | | picass0  | |  ElevenLabs     |
              |  Ollama     | | imggen   | |  TTS API        |
              |  :11434     | | :7860    | |  (external)     |
              |  dolphin-   | | SD v1.5  | |                 |
              |  llama3:8b  | | T4 GPU   | |                 |
              +-------------+ +----------+ +-----------------+
               100.127.18.29  100.127.18.29   api.elevenlabs.io

services

chat -- Ollama / dolphin-llama3:8b

Streaming chat via Ollama REST API. Messages are proxied from the Flask backend to picass0. Supports model switching -- any model pulled to Ollama is available via the /api/models endpoint.

model: dolphin-llama3:8b (Q4_K_M, ~6GB VRAM)
uncensored dolphin fine-tune on llama3 base
streaming SSE responses via /api/chat
conversation history maintained client-side

image gen -- Stable Diffusion v1.5 on T4

Image generation via a custom Flask endpoint on picass0. Uses the diffusers library with runwayml/stable-diffusion-v1-5 in float16. Model loads lazily on first request.

GPU: Tesla T4 (16GB VRAM)
resolution: 256-1024px, step 64
max steps: 50
attention slicing enabled for memory efficiency
returns base64-encoded PNG

audio -- ElevenLabs TTS

Text-to-speech via ElevenLabs API. Voice list is pulled dynamically. Supports monolingual v1, multilingual v2, and turbo v2.5 models. Audio returned as base64 MP3 and played in-browser with a pulsing visualizer.

default voice: Rachel (21m00Tcm4TlvDq8ikWAM)
API key sourced from vault: elevenlabs_miecz_api_key
character tracking via /api/health
audio history with replay in the UI

API endpoints

method	path	description
POST	/api/chat	streaming chat -- accepts `{messages, model}`, returns SSE token stream
GET	/api/models	list available Ollama models
POST	/api/image	image generation -- accepts `{prompt, negative_prompt, steps, width, height}`
POST	/api/tts	text-to-speech -- accepts `{text, voice_id, model_id}`, returns base64 audio
GET	/api/voices	list ElevenLabs voices
GET	/api/health	service health -- ollama, image gen, elevenlabs status + chars remaining

deployment

infrastructure

component	host	port	service
Flask app	questboard-ec2	8200	`dolphin-ui.service`
nginx proxy	questboard-ec2	8201	nginx sites-enabled/dolphin
image gen API	picass0	7860	`picass0-imggen.service`
Ollama	picass0	11434	ollama systemd

deploy script

Run from Rocky: bash ~/dolphin-ui/deploy.sh

fetches ElevenLabs API key from vault
SCPs app.py + static/ to ec2:/home/ubuntu/dolphin-ui/
creates/updates dolphin-ui.service systemd unit
injects EL key as environment variable
SCPs picass0_imggen.py to picass0
creates/updates picass0-imggen.service
configures nginx reverse proxy on :8201

dependencies

# ec2 (questboard)
flask, requests

# picass0 (GPU)
flask, torch, diffusers, transformers, accelerate, safetensors

web UI

single-page app at static/index.html (591 lines, all inline). dark theme with cyan/purple/magenta accents.

design system

token	value	usage
`--bg`	`#0a0a0f`	page background
`--surface`	`#12121a`	panels, header
`--accent`	`#00d4ff`	primary cyan
`--accent2`	`#7b2ff7`	purple
`--accent3`	`#ff2d95`	magenta/pink
`--success`	`#00ff88`	health dots (online)
`--error`	`#ff4444`	health dots (offline)

tabs

chat -- message area + textarea input, streaming responses, model selection
image -- split panel: controls left (prompt, negative, size, steps), canvas right
audio -- split panel: controls left (text, voice picker, model), visualizer right with history

features

health bar in header -- live status dots for ollama, GPU, elevenlabs (polls every 30s)
streaming chat with SSE -- tokens render as they arrive
shift+enter for newlines in chat input
auto-resizing textarea
animated pulse visualizer during audio playback
audio history with replay buttons

voice-dolphin (companion project)

planning / prototype

a standalone voice conversation loop that runs on Rocky. extends dolphin-ui into real-time spoken interaction.

  +----------+    +----------+    +----------+    +----------+
  |   mic    |--->| whisper  |--->|  ollama  |--->|elevenlabs|---> speakers
  |  (Rocky) |    | STT      |    |  (SSH    |    |  TTS     |
  |          |    | (Rocky   |    |  tunnel  |    |  (API)   |
  |          |    |  CPU)    |    |  picass0)|    |          |
  +----------+    +----------+    +----------+    +----------+

the loop

capture mic input via sounddevice
detect silence (1.5s threshold) to end capture
transcribe with faster-whisper (base model, CPU, int8)
send transcript to dolphin-llama3 via Ollama (SSH tunnel to picass0)
send response to ElevenLabs TTS (voice: Stiles)
play MP3 via ffplay or powershell fallback
loop until "quit" / "bye" / ctrl+c

modes

voice mode (default) -- full mic to STT to LLM to TTS to speaker loop
text mode (--text) -- keyboard input, still uses TTS for output if available

status

script complete (.artificer/.afterHours/voice-dolphin/dolphin.py)
not deployed -- requires picass0 running + SSH tunnel + mic access
ElevenLabs keys expired -- need rotation

open questions

picass0 idle timeout (10min) too short for chat sessions
voice cloning via ElevenLabs -- supported but unexplored
wake word detection ("hey dolphin")
persistent conversation memory for the dolphin
LoRA personality packs for dolphin-llama3

file map

file	location	purpose
`app.py`	`~/dolphin-ui/`	Flask backend -- routes, API proxying, health checks
`index.html`	`~/dolphin-ui/static/`	web UI -- 591 lines, all inline CSS/JS
`deploy.sh`	`~/dolphin-ui/`	deployment script -- ec2 + picass0 + nginx
`picass0_imggen.py`	`~/dolphin-ui/`	image gen API for picass0 GPU
`dolphin.py`	`~/.artificer/.afterHours/voice-dolphin/`	voice loop script (Rocky-side)
`voice-dolphin.md`	`~/.artificer/.afterHours/`	planning doc -- stack, models, build order
`brick_bot.py`	`~/dolphin-ui/`	Telegram agent (separate, shares deploy dir)
`mailbox_daemon.py`	`~/dolphin-ui/`	email listener for brick.sl (separate)

credentials

credential	vault entry	used by
ElevenLabs API key	`elevenlabs_miecz_api_key`	app.py (env var), dolphin.py
ElevenLabs fallback	`elevenlabs_ecoute_api_key`	dolphin.py (fallback)
picass0 SSH key	`~/forge3/questboard-key.pem`	deploy.sh, dolphin.py tunnel

known issues

no git repo -- project is loose files, not version controlled
no requirements.txt -- deps installed inline via deploy.sh
ElevenLabs keys may be expired (last validated: unknown)
picass0 is on-demand -- image gen and chat require GPU instance to be running
no auth on web UI -- anyone on tailnet can access
image gen uses SD v1.5 (old) -- could upgrade to SDXL if VRAM allows

documented by goulard3120 // tunnelTime // 2026-05-07