Building the qz-l AI Chat Assistant Using Google Gemini 2.5 Flash Lite + Next.js
Learn how qz-l.com's new AI chat assistant works using Gemini 2.5 Flash Lite, Next.js API routes, and function calling --- all free on the Google AI tier.
๐ค Building the qz-l AI Chat Assistant Using Google Gemini 2.5 Flash Lite + Next.js
I recently added a new feature to qz-l.com: an AI-powered chat assistant that can shorten URLs, show analytics, search blog posts, and help users navigate the service --- all through natural language.
This post explains how the assistant works under the hood using the modern @google/genai SDK and the free-tier model gemini-2.5-flash-lite, which runs entirely at zero cost within Google's usage limits.
๐ Why Build an AI Assistant?
qz-l's mission is simple: privacy-first URL shortening with analytics.
But users often ask:
- "How do I create a short link?"
- "Where's the dashboard?"
- "Can I delete a URL?"
- "What does this blog post say?"
Instead of building a whole help UI, I added a chat interface that can perform real actions using function calling.
๐ง Model Choice: gemini-2.5-flash-lite (Free)
The assistant uses:
- SDK: @google/genai
- Model: gemini-2.5-flash-lite
- Platform: Google AI Studio (free tier)
Why this model?
- ✓ Completely free within quota
- ✓ Very fast and low latency
- ✓ Full function calling support
- ✓ Perfect for automation + chat
- ✓ Stable enough for production workloads
Inspired by this resource list: https://github.com/cheahjs/free-llm-api-resources
๐️ System Architecture
The assistant lives inside a Next.js App Router API route:
/api/chat
High-level flow:
User Message
↓
Next.js API (/api/chat)
↓
Gemini LLM (with system prompt + tools)
↓
If function call → server executes logic
↓
LLM formats final Markdown response
↓
Chat UI displays answer (links, QR codes, etc.)
⚙️ Using @google/genai
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey });
const result = await ai.models.generateContent({
model: "gemini-2.5-flash-lite",
contents,
config: {
temperature: 0.7,
maxOutputTokens: 1024,
systemInstruction: {
role: "system",
parts: [{ text: SYSTEM_PROMPT }],
},
tools: [
{
functionDeclarations: [
shortenUrlDeclaration,
getUrlAnalyticsDeclaration,
listRecentUrlsDeclaration,
deleteUrlDeclaration,
searchBlogPostsDeclaration,
],
},
],
},
});
๐งฉ Function Calling
const shortenUrlDeclaration = {
name: "shortenUrl",
description: "Generate a shortened URL",
parameters: {
type: "object",
properties: {
longUrl: { type: "string" },
},
required: ["longUrl"],
},
};
Example function call return:
{
"functionCall": {
"name": "shortenUrl",
"args": { "longUrl": "https://google.com" }
}
}
๐งต The Function-Call Loop
- Ask Gemini for the next message
- Detect if it requested a function
- Execute the function on the server
- Append the result
- Call Gemini again for the final answer
๐ฏ Why This Approach Works
- ✓ Zero cost (Gemini free tier)
- ✓ Fast responses
- ✓ Deterministic output
- ✓ Secure
- ✓ Extendable
๐ Current Capabilities
- Shorten URLs
- Generate QR codes
- Fetch analytics
- Delete links
- Show recent URLs
- Search blog posts
- Explain features
๐ฎ Coming Enhancements
- Auth-protected actions
- Rate limiting
- Streaming responses
- Better UI
๐ Final Thoughts
You don't need expensive models to build production AI features --- just solid architecture and a good system prompt.
No comments:
Post a Comment