Client: Personal projects
Period: September 2025 to Present
Stack: Python, Claude, Gemini, Firecrawl, PDF generation, Excel, JSON
"AI agent" gets used loosely. Most "agents" are a single LLM call wrapped in a UI. What I mean by an agent is a productized pipeline: a deterministic input goes in, the LLM does the part only an LLM can do (judgement, scoring, summarization), and a structured artifact comes out. A PDF, an Excel sheet, a JSON file. No chat. No drift. Reusable.
That is the bar. Every agent in the library either clears it or gets cut.
Input: a single URL. Output: a 5-category scored audit (Schema, E-E-A-T, Citation-Readiness, Content Structure, Technical SEO) plus a downloadable PDF report. The agent runs the page through structured scoring, classifies AI-visibility risks, and writes the report in client-ready language. I run this agent on every prospect site before a discovery call so I walk in with specific gaps, not generic advice.
Input: a company URL. Output: full competitive landscape with named competitors, pricing snapshots, strengths and weaknesses per competitor, and a comparative matrix. Packaged as a PDF report. The agent does discovery, normalization, and writeup in one pass. Same agent I use during scoping for any client engagement where competitive context matters.
Input: a product description or industry. Output: keyword volumes, Google Trends momentum, competitor advertising keywords, CPC and intent signals. Delivered as a structured report. The agent runs the actual search-data lookups and the LLM does the synthesis. I use it to validate whether a niche has real demand before I recommend a build.
Input: a target site and 3 to 5 competitor sites. Output: pages and topics competitors publish that the target does not, classified into a unified topic taxonomy, with strategic prioritization. The agent crawls, classifies, scores, and writes the gap analysis. I use this when scoping content engagements or running competitive content audits.
The decisions that matter are not flashy.
Prompts in code, versioned in git. Every prompt lives in a SKILL.md or a Python module. Drift gets caught by diff, not by memory.
Eval at the edges. Each LLM step has a small validation layer that catches obvious failures: wrong output shape, missing required fields, hallucinated entities. When the step fails, the pipeline knows and either retries or routes to a fallback path.
Artifact-first thinking. Every agent's output is a real file (PDF, Excel, JSON). A messy Excel sheet beats a beautiful chat response, because the sheet can be re-processed, archived, or fed to the next step. Chat responses are inert.
Cost monitoring via Portkey. Every LLM call is observable. Token-budget thresholds prevent runaway runs. Models get swapped per task (Gemini for cheap classification, Claude for harder reasoning).
Beyond the four headline agents, I run 6+ smaller workflows in production. A personal stock-alerts pipeline that watches my portfolio config and fires Telegram alerts when entry or exit rules cross threshold. A morning portfolio digest that pulls overnight news and posts a single ranked summary. Each follows the same productization pattern: defined input, LLM reasoning step, structured artifact out.
This is the same pattern I would build for your team's internal ops, sales workflows, or content production. Not chat. Not Zapier-wired LLM calls. A library of well-shaped agents that each do one thing reliably and emit a real artifact you can act on or archive.
If your team has 5 to 10 manual workflows that someone is doing in a chat window and then copy-pasting somewhere, that is the exact shape I would productize. Same engineering instinct, scaled to your business.
Book a 30-min call. We will talk through the architecture and what it would take to ship something like this for you.