Senior AI Systems Engineer — Agentic Workflow Runtime

Knowtions Research Inc. o/a Lydia AI - 2 emplois

Toronto, ON

Posté hier

Détails de l'emploi :

Télétravail

Gestion

Lydia AI - ORCA
Lydia AI is building ORCA, an agentic AI platform that helps businesses research, analyze, and present complex information using AI-powered workflows.
ORCA helps enterprises scale expert-led work by encoding how complex work gets done. Our platform turns source materials, business rules, human judgment, and review processes into repeatable agentic workflows that produce completed work products such as decks, reports, evidence packs, plans, campaigns, and other enterprise deliverables.
The Opportunity
We are looking for a senior engineer to help scale the core runtime behind these workflows.This is not a prompt engineering role. This is an engineering-heavy role for someone who can design, build, debug, and scale production systems where LLMs, tools, retrieval, workflow state, databases, and human review need to work reliably together.
The Role
As a Senior AI Systems Engineer, you will help build and scale ORCA’s agentic workflow runtime. You will work across backend services, agent orchestration, retrieval, workflow state, persistence, service boundaries, and production reliability.
We are looking for someone roughly 10+ years into their engineering career, with meaningful experience building and scaling production systems.
Key Responsibilities

Build and scale production-grade Python backend services
Improve ORCA’s agent runtime, gateway, connector, and persistence architecture
Design reliable agent workflows using tool calling, function calling, structured outputs, retries, and fallbacks
Build retrieval and source-grounding systems for enterprise source materials
Own PostgreSQL / Supabase schema design, migrations, RLS, query performance, and workflow state persistence
Define clean API and service boundaries across runtime, connectors, frontend surfaces, and downstream tools
Instrument agent runs with logs, traces, evals, and regression tests
Improve reliability, latency, cost, debuggability, and production recovery
Harden tenant isolation, permission boundaries, auditability, and enterprise security
Mentor engineers and raise the technical bar across the platform

What We’re Looking For
You should have deep experience with:

Production backend engineering, ideally in Python, including async execution, task lifecycle, retries, timeouts, cancellation, background workers, and production debugging
Agentic AI systems, including LLM tool use, function calling, prompt routing, multi-turn state, context handling, retries, fallbacks, and multi-step workflow execution
Retrieval and source grounding, including RAG, embeddings, semantic search, source search, context assembly, document grounding, and retrieval quality
PostgreSQL and data modeling, including schema design, migrations, indexing, query performance, transaction boundaries, and debugging query plans
API and service architecture, including service contracts, versioning, idempotency, failure handling, and clean boundaries between systems
Evaluation and observability, including traces, logs, eval sets, golden cases, regression testing, and root-cause analysis for agent failures
Enterprise security, including tenant isolation, permission boundaries, audit logs, least-privilege access, and data leakage prevention
Operational debugging, including the ability to reason from logs, database state, service behavior, and imperfect production environments

ORCA currently includes some non-standard runtime components, including macOS / launchd-managed services, so you should be willing to operate outside standard Linux-only assumptions when needed.
Strong-to-Have Experience

LangGraph, LangChain, CrewAI, MCP, or similar orchestration frameworks
Anthropic, OpenAI, or other tool-use / function-calling model APIs
Multi-model routing
Telegram Bot API or chat-based workflow interfaces
Connector server design
Document ingestion, structured extraction, or source-pack processing
Supabase RLS at scale
Multi-tenant SaaS authorization
CI/CD, deployment, monitoring, and infrastructure hygiene
Frontend awareness, especially around surfacing workflow state and agent review states to users

You’ll Be a Strong Fit If You

Have scaled production systems before, not just built prototypes
Are comfortable owning architecture and implementation
Can debug complex failures across services, models, tools, and databases
Think carefully about workflow state, permissions, and data boundaries
Understand that agentic systems require engineering discipline, not just better prompts
Can help a fast-moving product mature into a reliable enterprise platform

Ideal Background
We expect the right candidate to be approximately 10+ years into their engineering career, with experience in backend systems, platform engineering, infrastructure, AI systems, or enterprise SaaS.

Postulez dès maintenant

Enregistrer

Senior AI Systems Engineer — Agentic Workflow Runtime

Knowtions Research Inc. o/a Lydia AI - 2 emplois

Toronto, ON

Détails de l'emploi :

Partager un emploi :