Self-Hosting AI Tools: Build Your Own AI-Powered Platform

RnR Vibe runs 25+ AI-powered tools on a single laptop. No cloud API bills, no external dependencies, no monthly costs. Here's the architecture and how you can build something similar.

The Stack

Frontend & API: Next.js 16 (App Router)
LLM: Ollama with Gemma 3 4B
Image Generation: Stable Diffusion (AUTOMATIC1111)
Hosting: Vercel (static) + Cloudflare Tunnel (AI services)
Total monthly cost: $0

Architecture Overview

Browser → Vercel (static pages, serverless API routes)
            ↓
       Cloudflare Tunnel
            ↓
     Local Machine
     ├── Ollama (port 11434) — LLM inference
     └── Stable Diffusion (port 7860) — Image generation

Static pages and the Next.js framework run on Vercel for free. When a user triggers an AI feature, the API route reaches back through a Cloudflare Tunnel to the local machine where the models actually run.

Step 1: Set Up Your AI Backend

Ollama for Text

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull your coding model
ollama pull gemma3:4b

# Verify it's running
curl http://localhost:11434/api/tags

Stable Diffusion for Images

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
# Edit webui-user.bat: set COMMANDLINE_ARGS=--api --xformers
./webui-user.bat

Step 2: Build API Routes

The key pattern: your Next.js API routes act as a bridge between the frontend and local AI services.

// app/api/chat/route.ts
export async function POST(req: Request) {
  const { messages, systemPrompt } = await req.json();

  const response = await fetch("http://localhost:11434/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "gemma3:4b",
      messages: [
        { role: "system", content: systemPrompt },
        ...messages,
      ],
      stream: true,
    }),
  });

  // Stream the response back to the client
  return new Response(response.body, {
    headers: { "Content-Type": "text/event-stream" },
  });
}

For image generation, the same pattern applies — proxy requests to Stable Diffusion's API.

Step 3: Server-Sent Events for Streaming

Long-running AI operations need streaming. SSE (Server-Sent Events) is simpler than WebSockets and works well for unidirectional data.

// Server side
const encoder = new TextEncoder();
const stream = new ReadableStream({
  async start(controller) {
    controller.enqueue(encoder.encode("event: status\ndata: {\"message\":\"Generating...\"}\n\n"));

    // Do the AI work
    const result = await generateImage(prompt);

    controller.enqueue(encoder.encode(`event: image\ndata: ${JSON.stringify(result)}\n\n`));
    controller.enqueue(encoder.encode("event: done\ndata: {}\n\n"));
    controller.close();
  },
});

return new Response(stream, {
  headers: {
    "Content-Type": "text/event-stream",
    "Cache-Control": "no-cache",
    Connection: "keep-alive",
  },
});

Critical tip: Set compress: false in your next.config.ts to prevent gzip from buffering SSE events in production.

Step 4: Expose with Cloudflare Tunnel

Cloudflare Tunnel lets your local machine serve traffic without port forwarding or a static IP.

# Install cloudflared
# Create a tunnel
cloudflared tunnel create my-tunnel

# Configure routes
# ~/.cloudflared/config.yml
tunnel: <tunnel-id>
ingress:
  - hostname: mysite.com
    service: http://localhost:3000
  - hostname: llm.mysite.com
    service: http://localhost:11434
  - service: http_status:404

# Run the tunnel
cloudflared tunnel run my-tunnel

Step 5: Deploy Static to Vercel

Push your Next.js project to GitHub and connect it to Vercel. The static pages and serverless functions deploy automatically. AI-dependent API routes reach back to your machine through the tunnel.

# Install Vercel CLI
npm i -g vercel

# Deploy
vercel --prod

Handling Downtime

Since AI features depend on your local machine being on, you need a graceful fallback:

// Check if AI backend is available
try {
  const ping = await fetch("http://localhost:11434/api/tags", {
    signal: AbortSignal.timeout(3000),
  });
  if (!ping.ok) throw new Error();
} catch {
  return Response.json(
    { error: "AI services are currently offline. They run on local hardware and will be back soon." },
    { status: 503 }
  );
}

Non-AI pages (blog posts, guides, static tools) remain available 24/7 through Vercel.

What We Learned

Things that worked well:

Ollama's API is dead simple to integrate
Cloudflare Tunnel is rock solid and free
SSE streaming gives users real-time progress feedback
Vercel's free tier handles static + serverless perfectly

Things that bit us:

Next.js production gzip buffering breaks SSE streams (fix: compress: false)
Stable Diffusion's --nowebui flag doesn't work reliably — just use --api
Large base64 images split across SSE chunks need careful parsing
Model switching in SD takes 15-30 seconds — check if the model is already loaded first

Cost Breakdown

| Component | Monthly Cost | |-----------|-------------| | Vercel hosting | $0 (free tier) | | Cloudflare Tunnel | $0 (free) | | Ollama | $0 (runs locally) | | Stable Diffusion | $0 (runs locally) | | Domain name | ~$1/month | | Total | ~$1/month |

Compare this to cloud APIs: at even moderate usage, you'd spend $50-200/month on OpenAI + Midjourney/DALL-E.

Start Building

You don't need a data center to run AI tools. A decent laptop, some free software, and a weekend of setup gives you a platform that rivals paid services. Every tool on RnR Vibe is proof that this approach works.

Self-Hosting AI Tools: Build Your Own AI-Powered Platform

Self-Hosting AI Tools: Build Your Own AI-Powered Platform

The Stack

Architecture Overview

Step 1: Set Up Your AI Backend

Ollama for Text

Stable Diffusion for Images

Step 2: Build API Routes

Step 3: Server-Sent Events for Streaming

Step 4: Expose with Cloudflare Tunnel

Step 5: Deploy Static to Vercel

Handling Downtime

What We Learned

Cost Breakdown

Start Building

Stay in the flow