Portfolio Chatbot Architecture
Structured, reliable chatbot for my portfolio with routing, form-only contact, and clear boundaries.
I recently added a chatbot to my portfolio to help visitors navigate my work and get in touch. Visitors can ask about my experience or send a message without leaving the page.
In this post I cover routing, a two-tier model strategy, form-only contact, build-time context, rate limiting, persona boundaries, and state and security.
The system is built as a directed graph. Every message is classified by intent, then routed to either a Q&A path or a programmatic contact flow.
1. Deterministic Routing: Why a Router?
I wanted two distinct behaviors. The bot should answer questions based on my experience or offer a clear path to contact me.
Mixing these into one giant prompt was possible, but messy and prone to hallucinated instructions. By using a router, the system stays simple: one step classifies the intent, and the next step runs a specialized task. This separation makes it easier to add rate limits, fallback logic, and a "chat unavailable" path without tangling the core logic.
2. The Multi-Tier Model Strategy
To optimize for both speed and cost, I use a two-tier model setup:
The classifier (fast and cheap): A model optimized for low-latency, single-token-style output handles the initial routing. It only needs to output one word: question or contact.
The responder (strong and capable): A model used for open-ended generation over a large context is only invoked for the answer node, where it has to combine my resume, projects, and blog posts into a coherent reply.
The fallback: If the fast model is down, the system falls back to the stronger model for classification but applies a stricter rate limit to protect my quota. If both fail, the UI directs the user to the standalone contact page.
To find the current node, click on “View logic”.
3. Form-Only Contact
At first I tried a conversational lead-gen flow, asking for a name, then an email, then a message. It was brittle. Short names or test messages often triggered a "topic switch" misclassification.
So, I switched back to a standard UI. When a user expresses contact intent, the chat input is disabled and a structured form is shown directly in the chat thread.
This takes away the back-and-forth of message and also maintains a UI that is familiar to the user during data collection.
Additionally, I added an "Ask something else" button dismisses the form and clears contact state. The server also appends a short prompt so that the user is informed of the switch.
This is helpful if the user wants to clarify something before messaging, changes their mind about contact. It also helps on the off-chance that the AI hallucinates. The user can bring the focus back.
4. RAG-Lite: Build-Time Context and Caching
I did not need a vector database for a portfolio since I don't have huge content. My knowledge model also provides an ample context window.
Instead I use a build-time blog cache:
A script fetches my latest blog posts from my external CMS during the build.
The result is stored as a local JSON file.
The answer node reads this file at request time and injects it into the system prompt.
If the blog fetch fails at build time, the script keeps the previous cache or writes an empty list so the build still succeeds. That way there is no runtime dependency on my CMS, and context (links and titles) stays accurate and fast. The answer prompt is also structured into sections (work history, projects, skills, blog, hire/contact) so the model can focus on the relevant part and cite sources by name and link, and visitors can click through to a project or post.
Using a RAG context, also removes unnecessary latency.
5. Resilience: Layered Rate Limiting
To protect my free-tier quotas and limit spam, I use three layers of rate limiting:
Standard chat limit: A cap on general Q&A requests.
Fallback limit: A tighter cap when the stronger model is used for classification (e.g. when the fast model is down).
Contact limit: A strict, per-identifier limit on form submissions to reduce email spam.
6. Persona Boundaries and Adversarial Questions
A portfolio bot needs a clear persona and explicit boundaries.
Out-of-bounds: The prompt defines the bot as representing me, a full stack developer. It only answers about my experience, projects, skills, blog, and hiring or contact. For anything else it politely declines and points to the contact form.
That keeps the bot on topic and avoids answering things I don’t want it to (e.g. medical or legal). Since scope lives in one place, it is easy to tighten or relax later.
Negative framing: For questions like "Why shouldn't I hire her?" the model is instructed not to repeat the negative framing. Instead it pivots to fit: it asks about tech stack or project needs or suggests a call to discuss.
7. State Management and Security
Persistence: Conversations are keyed by a unique identifier. State (message history, contact flags) lives in a server-side key-value store with a 24-hour TTL. When the user reopens the chat, the client fetches the thread by that identifier so history loads without storing it all in the browser. The server stays stateless and the client stays light.
Safety: Since the model outputs markdown, the client converts it to HTML and passes it through a strict sanitization layer. The sanitization layer only allows safe tags like <a>, <code>, and <strong>. Citations and links from the model become clickable without trusting raw HTML. This helps prevent XSS attacks.
Thanks for reading! Let me know what you liked. Is there anything that I could do differently? Have you implemented something similar?


