Why I Moved AI Out of NestJS and Into a Dedicated Python LangGraph Service
When your AI pipeline lives inside your main backend, everything feels fine — until it doesn't. I hit that wall on MealPlan AI, an AI-powered meal planning platform where NestJS handled both business logic and LLM orchestration.
The AI SDK chain was a black box. No observability into token costs. No crash recovery mid-generation. No way to validate LLM outputs before persisting them. When a 7-day meal plan generation failed on day 5, the entire thing restarted from scratch.
Something had to change.
The Architecture Split
I separated the system into two services with clear responsibilities:
- NestJS — business logic, auth, payments, job orchestration via BullMQ
- Python FastAPI — all LLM orchestration, validation, and AI-specific tooling
Why Python? LangGraph, LangChain, and the broader AI ecosystem are Python-first. Fighting that with TypeScript wrappers added complexity without adding value.
The 5-Node LangGraph StateGraph
The core of the Python service is a LangGraph StateGraph with five nodes:
- prepare_context — loads dietary restrictions, participant profiles, calorie targets, and retrieves relevant recipes via RAG (2.2M recipes from RecipeNLG, 80K foods from USDA)
- generate_day — calls the LLM with structured prompts including diversity history and calorie distribution targets
- validate_day — two-layer validation: programmatic restriction checking first, optional LLM self-validation second
- emit_day — streams a DAY_COMPLETED SSE event so NestJS can persist immediately via JSONB atomic append
- update_history — tracks dishes, ingredients, and cuisines to enforce diversity across the full meal plan
AsyncPostgresSaver checkpointing lets us resume from exactly where we stopped — no wasted LLM calls.
Type Safety Across Languages
A dual-language service creates a type drift risk. I solved this with a one-directional pipeline: Zod schemas (TypeScript) export to JSON Schema, which generates Pydantic models (Python). A CI workflow runs on every PR to catch drift before it reaches production.
What This Unlocked
Splitting AI into its own service wasn't just a refactor — it enabled features that would have been painful to build in the monolithic setup:
- Langfuse observability — every LLM call traced with token counts, costs, and latency. Self-hosted, full control over data.
- Incremental persistence — each day saves immediately. A
PARTIALLY_COMPLETEDstatus lets users see progress and resume interrupted plans. - Granular regeneration — separate endpoints for regenerating a single day (full graph) or a single meal (direct LLM call) with user feedback injected into prompts.
- RAG retrieval — full-text search against RecipeNLG and USDA datasets with pgvector fallback for semantic search.
- 329 Python tests — pytest-asyncio covering every node, validator, and edge case independently from the NestJS test suite.
Key Takeaways
- Separate AI from business logic early. The longer you wait, the harder the extraction. AI services have different scaling, testing, and deployment needs.
- Use LangGraph for multi-step AI workflows. A linear chain breaks down when you need validation loops, conditional retries, and state management across steps.
- Invest in type contracts across languages. Zod-to-JSON-Schema-to-Pydantic catches bugs at build time that would otherwise surface as silent data corruption.
- Stream incrementally, persist incrementally. Users shouldn't wait for a 30-second generation to complete before seeing anything. SSE + atomic JSONB appends make this straightforward.
Hello World — Welcome to My Blog
NextPostgreSQL + ClickHouse: The Dual-Database Pattern That Made 70M-Row Dashboards Instant