Titre du poste ou emplacement

Senior AI Systems Engineer — Agentic Workflow Runtime

Knowtions Research Inc. o/a Lydia AI - 2 emplois

Toronto, ON

Posté hier

Détails de l'emploi :

Télétravail
Gestion

Lydia AI - ORCA
Lydia AI is building ORCA, an agentic AI platform that helps businesses research, analyze, and present complex information using AI-powered workflows.
ORCA helps enterprises scale expert-led work by encoding how complex work gets done. Our platform turns source materials, business rules, human judgment, and review processes into repeatable agentic workflows that produce completed work products such as decks, reports, evidence packs, plans, campaigns, and other enterprise deliverables.
The Opportunity
We are looking for a senior engineer to help scale the core runtime behind these workflows.This is not a prompt engineering role. This is an engineering-heavy role for someone who can design, build, debug, and scale production systems where LLMs, tools, retrieval, workflow state, databases, and human review need to work reliably together.
The Role
As a Senior AI Systems Engineer, you will help build and scale ORCA’s agentic workflow runtime. You will work across backend services, agent orchestration, retrieval, workflow state, persistence, service boundaries, and production reliability.
We are looking for someone roughly 10+ years into their engineering career, with meaningful experience building and scaling production systems.
Key Responsibilities
  • Build and scale production-grade Python backend services
  • Improve ORCA’s agent runtime, gateway, connector, and persistence architecture
  • Design reliable agent workflows using tool calling, function calling, structured outputs, retries, and fallbacks
  • Build retrieval and source-grounding systems for enterprise source materials
  • Own PostgreSQL / Supabase schema design, migrations, RLS, query performance, and workflow state persistence
  • Define clean API and service boundaries across runtime, connectors, frontend surfaces, and downstream tools
  • Instrument agent runs with logs, traces, evals, and regression tests
  • Improve reliability, latency, cost, debuggability, and production recovery
  • Harden tenant isolation, permission boundaries, auditability, and enterprise security
  • Mentor engineers and raise the technical bar across the platform

What We’re Looking For
You should have deep experience with:
  • Production backend engineering, ideally in Python, including async execution, task lifecycle, retries, timeouts, cancellation, background workers, and production debugging
  • Agentic AI systems, including LLM tool use, function calling, prompt routing, multi-turn state, context handling, retries, fallbacks, and multi-step workflow execution
  • Retrieval and source grounding, including RAG, embeddings, semantic search, source search, context assembly, document grounding, and retrieval quality
  • PostgreSQL and data modeling, including schema design, migrations, indexing, query performance, transaction boundaries, and debugging query plans
  • API and service architecture, including service contracts, versioning, idempotency, failure handling, and clean boundaries between systems
  • Evaluation and observability, including traces, logs, eval sets, golden cases, regression testing, and root-cause analysis for agent failures
  • Enterprise security, including tenant isolation, permission boundaries, audit logs, least-privilege access, and data leakage prevention
  • Operational debugging, including the ability to reason from logs, database state, service behavior, and imperfect production environments

ORCA currently includes some non-standard runtime components, including macOS / launchd-managed services, so you should be willing to operate outside standard Linux-only assumptions when needed.
Strong-to-Have Experience
  • LangGraph, LangChain, CrewAI, MCP, or similar orchestration frameworks
  • Anthropic, OpenAI, or other tool-use / function-calling model APIs
  • Multi-model routing
  • Telegram Bot API or chat-based workflow interfaces
  • Connector server design
  • Document ingestion, structured extraction, or source-pack processing
  • Supabase RLS at scale
  • Multi-tenant SaaS authorization
  • CI/CD, deployment, monitoring, and infrastructure hygiene
  • Frontend awareness, especially around surfacing workflow state and agent review states to users

You’ll Be a Strong Fit If You
  • Have scaled production systems before, not just built prototypes
  • Are comfortable owning architecture and implementation
  • Can debug complex failures across services, models, tools, and databases
  • Think carefully about workflow state, permissions, and data boundaries
  • Understand that agentic systems require engineering discipline, not just better prompts
  • Can help a fast-moving product mature into a reliable enterprise platform

Ideal Background
We expect the right candidate to be approximately 10+ years into their engineering career, with experience in backend systems, platform engineering, infrastructure, AI systems, or enterprise SaaS.

Partager un emploi :