Job Title or Location
RECENT SEARCHES

Data Platform Software Lead

Cognichip Inc. - 5 Jobs
Toronto, ON
Posted 3 days ago
Job Details:
Full-time
Experienced

Job Description

About the Role:

We are seeking a skilled and pragmatic Data Platform Engineer to architect and scale intelligent data systems that support our AI and ML pipelines—focused specifically on code-based text datasets. You will play a central role in building the infrastructure that powers data ingestion, transformation, and delivery for our models. This includes developing systems for web-scale data discovery and crawling, designing robust data pipelines, and enabling our scientists to experiment and iterate with confidence. If you are excited by building scalable, ML-ready data platforms at the intersection of engineering and AI, we want to hear from you.

Core Responsibilities:

  • Design and implement scalable data infrastructure to ingest, transform, and manage large-scale code datasets, ensuring high reliability and modularity.
  • Build systems and tools for automated web crawling, parsing, deduplication, and metadata extraction from open-source and public code repositories.
  • Develop robust data pipelines for ingesting and processing structured text datasets using distributed compute frameworks. Monitor quality, throughput, and performance.
  • Build tools to support data visualization, sampling, and analytics to drive better model outcomes and data understanding.
  • Collaborate across research, infrastructure, and compliance teams to meet technical, operational, and regulatory requirements.

Required Skills

  • 5+ years of software engineering experience in data-intensive environments
  • Proven experience building and maintaining scalable data systems and infrastructure
  • Experience with web crawling, scraping frameworks, and large-scale data ingest
  • Comfortable with AWS or other cloud environments, including storage, containerized compute, and security
  • Working experience with data-centric tech stack including Python, Go, or Scala; Spark or Ray; Airflow or Prefect; Kafka; Redis; PostgreSQL or ClickHouse; and GitHub APIs
  • Understanding of how datasets feed into AI/ML workflows

Preferred Qualifications

  • Experience curating and preparing code-based datasets for language models or code intelligence applications
  • Familiarity with code parsing, tokenization, embedding and static analysis
  • Prior experience in a startup or fast-paced, high-ownership engineering environment
  • Strong written and verbal communication skills

What We Offer

  • Opportunity to shape the technical direction of a disruptive AI startup
  • Work with cutting-edge technologies in AI/ML and cloud computing
  • Competitive compensation package including equity
  • High-caliber, talented collaborators from diverse disciplines
  • Collaborative and innovative startup culture

Share This Job: