Building the qz-l AI Chat Assistant Using Google Gemini 2.5 Flash Lite + Next.js

Learn how qz-l.com's new AI chat assistant works using Gemini 2.5 Flash Lite, Next.js API routes, and function calling --- all free on the Google AI tier.

November 20, 2025•By qz-l team

🤖 Building the qz-l AI Chat Assistant Using Google Gemini 2.5 Flash Lite + Next.js

I recently added a new feature to qz-l.com: an AI-powered chat assistant that can shorten URLs, show analytics, search blog posts, and help users navigate the service --- all through natural language.

This post explains how the assistant works under the hood using the modern @google/genai SDK and the free-tier model gemini-2.5-flash-lite, which runs entirely at zero cost within Google's usage limits.

🚀 Why Build an AI Assistant?

qz-l's mission is simple: privacy-first URL shortening with analytics.

But users often ask:

"How do I create a short link?"
"Where's the dashboard?"
"Can I delete a URL?"
"What does this blog post say?"

Instead of building a whole help UI, I added a chat interface that can perform real actions using function calling.

🧠 Model Choice: gemini-2.5-flash-lite (Free)

The assistant uses:

SDK: @google/genai
Model: gemini-2.5-flash-lite
Platform: Google AI Studio (free tier)

Why this model?

✓ Completely free within quota
✓ Very fast and low latency
✓ Full function calling support
✓ Perfect for automation + chat
✓ Stable enough for production workloads

Inspired by this resource list: https://github.com/cheahjs/free-llm-api-resources

🏗️ System Architecture

The assistant lives inside a Next.js App Router API route:

/api/chat

High-level flow:

User Message
    ↓
Next.js API (/api/chat)
    ↓
Gemini LLM (with system prompt + tools)
    ↓
If function call → server executes logic
    ↓
LLM formats final Markdown response
    ↓
Chat UI displays answer (links, QR codes, etc.)

⚙️ Using @google/genai

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey });

const result = await ai.models.generateContent({
  model: "gemini-2.5-flash-lite",
  contents,
  config: {
    temperature: 0.7,
    maxOutputTokens: 1024,
    systemInstruction: {
      role: "system",
      parts: [{ text: SYSTEM_PROMPT }],
    },
    tools: [
      {
        functionDeclarations: [
          shortenUrlDeclaration,
          getUrlAnalyticsDeclaration,
          listRecentUrlsDeclaration,
          deleteUrlDeclaration,
          searchBlogPostsDeclaration,
        ],
      },
    ],
  },
});

🧩 Function Calling

const shortenUrlDeclaration = {
  name: "shortenUrl",
  description: "Generate a shortened URL",
  parameters: {
    type: "object",
    properties: {
      longUrl: { type: "string" },
    },
    required: ["longUrl"],
  },
};

Example function call return:

{
  "functionCall": {
    "name": "shortenUrl",
    "args": { "longUrl": "https://google.com" }
  }
}

🧵 The Function-Call Loop

Ask Gemini for the next message
Detect if it requested a function
Execute the function on the server
Append the result
Call Gemini again for the final answer

🎯 Why This Approach Works

✓ Zero cost (Gemini free tier)
✓ Fast responses
✓ Deterministic output
✓ Secure
✓ Extendable

📈 Current Capabilities

Shorten URLs
Generate QR codes
Fetch analytics
Delete links
Show recent URLs
Search blog posts
Explain features

🔮 Coming Enhancements

Auth-protected actions
Rate limiting
Streaming responses
Better UI

🎉 Final Thoughts

You don't need expensive models to build production AI features --- just solid architecture and a good system prompt.

San Song Daddy :)