Overview
You will build the LLM-powered bill parsing and consumption-anomaly detection systems at the core of AirBills. You will own the eval harness, ground-truth labeling, and model selection — and you will be measured on accuracy, latency, and cost.
What you’ll do
- Build and operate LLM pipelines for utility-bill extraction (PDF, image, scanned, multi-utility).
- Own the eval set, ground-truth labeling, and regression testing for parsing accuracy.
- Build anomaly-detection models for usage and cost spikes across customer portfolios.
- Drive model / provider selection, prompt strategy, and cost / latency tradeoffs.
- Partner with backend engineers to productionize models behind reliable APIs.
- Establish ML observability — track quality, drift, and unit economics.
What we’re looking for
- 4+ years of ML in production (not just research / notebooks).
- 1+ year working with LLMs in production: evals, prompt design, structured output.
- Strong Python and modern ML tooling; comfortable in a TypeScript codebase too.
- Pragmatic about model choice — knows when fine-tuning helps and when it does not.
- Comfortable owning quality metrics and being on the hook for them.
Nice to have
- Vision-LLM or document-extraction experience (Gemini, Claude, GPT-4o on PDFs / images).
- Background in time-series anomaly detection.
- Has shipped an LLM-graded eval harness from scratch.
Compensation & location
Base salary (mid-point)
$200,000
Range: $175,000 – $235,000 base + equity
Location: San Francisco
Final offer depends on experience, location, and interview signal. Equity grants come with every offer. We also cover health, dental, vision, and a learning budget.
About the team
We are a remote-first team and bill parsing is one of our highest-leverage AI bets. You will partner closely with PM, Platform, and Ops on shipping models that hold up in production.