Self-Hosting AI Tools: Build Your Own AI-Powered Platform
RnR Vibe runs 25+ AI-powered tools on a single laptop. No cloud API bills, no external dependencies, no monthly costs. Here's the architecture and how you can build something similar.
The Stack
- Frontend & API: Next.js 16 (App Router)
- LLM: Ollama with Gemma 3 4B
- Image Generation: Stable Diffusion (AUTOMATIC1111)
- Hosting: Vercel (static) + Cloudflare Tunnel (AI services)
- Total monthly cost: $0
Architecture Overview
Browser → Vercel (static pages, serverless API routes)
↓
Cloudflare Tunnel
↓
Local Machine
├── Ollama (port 11434) — LLM inference
└── Stable Diffusion (port 7860) — Image generation
Static pages and the Next.js framework run on Vercel for free. When a user triggers an AI feature, the API route reaches back through a Cloudflare Tunnel to the local machine where the models actually run.
Step 1: Set Up Your AI Backend
Ollama for Text
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull your coding model
ollama pull gemma3:4b
# Verify it's running
curl http://localhost:11434/api/tags
Stable Diffusion for Images
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
# Edit webui-user.bat: set COMMANDLINE_ARGS=--api --xformers
./webui-user.bat
Step 2: Build API Routes
The key pattern: your Next.js API routes act as a bridge between the frontend and local AI services.
// app/api/chat/route.ts
export async function POST(req: Request) {
const { messages, systemPrompt } = await req.json();
const response = await fetch("http://localhost:11434/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gemma3:4b",
messages: [
{ role: "system", content: systemPrompt },
...messages,
],
stream: true,
}),
});
// Stream the response back to the client
return new Response(response.body, {
headers: { "Content-Type": "text/event-stream" },
});
}
For image generation, the same pattern applies — proxy requests to Stable Diffusion's API.
Step 3: Server-Sent Events for Streaming
Long-running AI operations need streaming. SSE (Server-Sent Events) is simpler than WebSockets and works well for unidirectional data.
// Server side
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
controller.enqueue(encoder.encode("event: status\ndata: {\"message\":\"Generating...\"}\n\n"));
// Do the AI work
const result = await generateImage(prompt);
controller.enqueue(encoder.encode(`event: image\ndata: ${JSON.stringify(result)}\n\n`));
controller.enqueue(encoder.encode("event: done\ndata: {}\n\n"));
controller.close();
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
Critical tip: Set compress: false in your next.config.ts to prevent gzip from buffering SSE events in production.
Step 4: Expose with Cloudflare Tunnel
Cloudflare Tunnel lets your local machine serve traffic without port forwarding or a static IP.
# Install cloudflared
# Create a tunnel
cloudflared tunnel create my-tunnel
# Configure routes
# ~/.cloudflared/config.yml
tunnel: <tunnel-id>
ingress:
- hostname: mysite.com
service: http://localhost:3000
- hostname: llm.mysite.com
service: http://localhost:11434
- service: http_status:404
# Run the tunnel
cloudflared tunnel run my-tunnel
Step 5: Deploy Static to Vercel
Push your Next.js project to GitHub and connect it to Vercel. The static pages and serverless functions deploy automatically. AI-dependent API routes reach back to your machine through the tunnel.
# Install Vercel CLI
npm i -g vercel
# Deploy
vercel --prod
Handling Downtime
Since AI features depend on your local machine being on, you need a graceful fallback:
// Check if AI backend is available
try {
const ping = await fetch("http://localhost:11434/api/tags", {
signal: AbortSignal.timeout(3000),
});
if (!ping.ok) throw new Error();
} catch {
return Response.json(
{ error: "AI services are currently offline. They run on local hardware and will be back soon." },
{ status: 503 }
);
}
Non-AI pages (blog posts, guides, static tools) remain available 24/7 through Vercel.
What We Learned
Things that worked well:
- Ollama's API is dead simple to integrate
- Cloudflare Tunnel is rock solid and free
- SSE streaming gives users real-time progress feedback
- Vercel's free tier handles static + serverless perfectly
Things that bit us:
- Next.js production gzip buffering breaks SSE streams (fix:
compress: false) - Stable Diffusion's
--nowebuiflag doesn't work reliably — just use--api - Large base64 images split across SSE chunks need careful parsing
- Model switching in SD takes 15-30 seconds — check if the model is already loaded first
Cost Breakdown
| Component | Monthly Cost | |-----------|-------------| | Vercel hosting | $0 (free tier) | | Cloudflare Tunnel | $0 (free) | | Ollama | $0 (runs locally) | | Stable Diffusion | $0 (runs locally) | | Domain name | ~$1/month | | Total | ~$1/month |
Compare this to cloud APIs: at even moderate usage, you'd spend $50-200/month on OpenAI + Midjourney/DALL-E.
Start Building
You don't need a data center to run AI tools. A decent laptop, some free software, and a weekend of setup gives you a platform that rivals paid services. Every tool on RnR Vibe is proof that this approach works.
Stay in the flow
Get vibecoding tips, new tool announcements, and guides delivered to your inbox.
No spam, unsubscribe anytime.