Live Product

TubeText:
AI Transcript Engine

An async backend that orchestrates multiple AI services to extract, summarize, and translate YouTube video content — turning hours of video into actionable text in seconds.

TubeText Demo
AI Services
3

Stack

FastAPI, Python, LangChain

AI Services

OpenAI, Deepgram, Cerebras

Database

PostgreSQL + SQLAlchemy

Deployment

Docker + Railway

The Challenge

YouTube is one of the richest knowledge sources on the internet — tutorials, lectures, interviews, deep dives on any topic. But most videos are long, full of filler, and hard to skim. Getting to the content that matters means watching everything.

The technical challenge was building an async pipeline that orchestrates multiple AI services (transcription, summarization, translation) without blocking — while keeping response times fast enough for a real-time user experience.

AI Pipeline Architecture

TubeText AI Pipeline Architecture

Full AI pipeline from URL to translated transcript

LangChain Summary Agent

Summaries are generated through a LangChain agent with YAML-driven prompts. This keeps prompt engineering separate from application logic, making it easy to iterate on prompt quality without touching code.

agents/summarize_agent.py
from langchain.agents import create_agent
import yaml

# Load prompt from YAML config
def load_prompts():
    with open("agents/prompts.yaml", "r", encoding="utf-8") as f:
        return yaml.safe_load(f)

prompts = load_prompts()

summary_agent = create_agent(
    model="openai:gpt-5-mini",
    system_prompt=prompts["SUMMARIZE_PROMPT"]
)

# Invoked async from the route handler
result = await summary_agent.ainvoke(
    {"messages": [{"role": "user", "content": transcript_text}]}
)
summary = result["messages"][-1].content

Real-Time Streaming Translation

Translation uses Server-Sent Events (SSE) to stream results segment by segment. Cerebras GPT-OSS-120b was chosen for its fast inference speed, giving users real-time feedback instead of waiting for the entire transcript to translate.

agents/translate_agent.py
from cerebras.cloud.sdk import AsyncCerebras
import yaml, os

prompts = yaml.safe_load(open("agents/prompts.yaml"))
translate_prompt = prompts["TRANSLATE_PROMPT"]
client = AsyncCerebras(api_key=os.environ.get("CEREBRAS_API_KEY"))

async def translate(text: str, language: str) -> str:
    response = await client.chat.completions.create(
        model="gpt-oss-120b",
        messages=[
            {"role": "system", "content": translate_prompt},
            {"role": "user", "content": f"Translate to {language}:\n\n{text}"},
        ],
    )
    return response.choices[0].message.content
routes/translate_router.py
async def event_generator():
    """Yield SSE events as each segment is translated."""
    for i in range(0, len(request.segments), CHUNK_SIZE):
        chunk_text = " ".join(seg.text for seg in request.segments[i:i+1])
        translated = await translate(chunk_text, request.language)
        yield f"data: {json.dumps({'translation': translated})}\n\n"
    yield f"data: {json.dumps({'done': True})}\n\n"

return StreamingResponse(
    event_generator(),
    media_type="text/event-stream",
    headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)

Async Database Layer

The entire data layer is async — PostgreSQL via asyncpg with SQLAlchemy 2.0 async sessions. Alembic handles schema migrations. Usage tracking resets automatically on calendar month boundaries without any background jobs.

database/connection.py
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker
import os

# Auto-convert to asyncpg driver
database_url = os.getenv("DATABASE_URL")
if database_url and database_url.startswith("postgresql://"):
    database_url = database_url.replace("postgresql://", "postgresql+asyncpg://", 1)

engine = create_async_engine(database_url)
SessionLocal = async_sessionmaker(
    bind=engine, autoflush=False, autocommit=False
)

async def get_db():
    async with SessionLocal() as db:
        yield db

System Design Tradeoffs

Every architectural choice has a cost. Here are the key tradeoffs I made and why.

architecture_decisions.md
# ADR-001: Signed cookies for anonymous users
Decision:  Track anonymous usage via signed HTTP-only cookies
Why:       No DB pollution — only a counter, no rows until signup
Tradeoff:  Counter resets if cookie clears (acceptable for free tier)

# ADR-002: JWT in HTTP-only cookies
Decision:  Store auth tokens as HTTP-only cookies, not localStorage
Why:       XSS-safe — browser sends automatically, JS can't access
Tradeoff:  Can't read token from JS (by design)

# ADR-003: PostgreSQL as single data store
Decision:  Async PostgreSQL via asyncpg — no Redis, no cache layer
Why:       Simpler infra, fewer moving parts to maintain
Tradeoff:  Requires careful handling of SQLAlchemy async patterns

# ADR-004: SSE for translation streaming
Decision:  Server-Sent Events over WebSockets
Why:       Simpler than WebSockets for one-way data flow
Tradeoff:  HTTP 200 already sent — mid-stream errors need SSE error events

# ADR-005: Cerebras for translation
Decision:  Cerebras GPT-OSS-120b instead of OpenAI for translation
Why:       Fastest inference speed available, free tier
Tradeoff:  Less model variety, models can be deprecated

# ADR-006: Deepgram for premium transcription
Decision:  Deepgram Nova-3 for videos without captions
Why:       Handles any video regardless of caption availability
Tradeoff:  Adds yt-dlp + ffmpeg dependency — heavier infra