Job Title or Location
RECENT SEARCHES

Lead Data Engineer

SapienSecure
Vancouver, BC
Posted yesterday
Job Details:
Full-time
Experienced

Company Description

SapienSecure is a leader in healthcare management technology, leveraging cutting-edge Artificial Intelligence, NLP, and vision AI to revolutionize healthcare operations. Our focus is on automating Revenue Cycle Management (RCM), billing processes, and patient flow optimization to set new standards in the industry. We solve complex problems in billing audit, automation, requisition automation, and data de-identification using advanced technologies like NER, ML, OCR, and NLP.

Role Description

This is a full-time on-site role in Vancouver, BC for a Lead Data Engineer at SapienSecure. The Lead Data Engineer will be responsible for tasks including data engineering, data modeling, ETL processes, data warehousing, and data analytics. The role involves working on-site to develop and optimize data systems for healthcare management technology.

The position will be based in Vancouver near Vancouver General Hospital, with a hybrid work arrangement.

As a Data Engineer for SapienSecure, you will have the opportunity to create and implement innovative solutions to some of the greatest challenges facing public healthcare here in Canada. If you've ever been looking for a way to use your skills to make a huge impact, this role will give you that chance. This role will require strong data intuition and creative problem solving to select and build the right AI solutions and systems.

This role involves the analysis of complex and nuanced medical reports, clinical notes, and other related healthcare texts. While experience with medical domain would be a huge advantage, we have many physicians on our team that can help get the right candidate up to speed. The large volume of text necessitates efficient data management practices and the use of NLP techniques to pre-sort and extract the most valuable data to speed up downstream labelling tasks will be critical.

To get data into our models, this candidate will be tasked with implementing data pipelines and ETL techniques to extract large volumes of data. They will be responsible for liaising and project managing the integration with client databases, RIS/PACS, and other data sources. Once the data is out of their systems, they will have to transform them into our data format.

We use cutting edge NLP technologies such as fine-tuning our own large language models to achieve state of the art performance on complex tasks. While the role itself will not be directly responsible for the research and development of these models, experience with LLMs/prompt engineering, machine learning algorithms, or the underlying libraries used in model inference will allow the candidate to be more involved with the discussion about architectures or design approaches to the NLP itself.

Lastly, as we are actively deploying and utilizing AI in production environments, the candidate may be required to set up production pipelines for these services. From wrapping inference code in REST APIs to deploying models in the cloud to packaging up models for clients to deploy in their own Kubernetes or container platforms, experience with production quality software practices will be valuable.

Responsibilities

  • Design and implement scalable data architectures to support machine learning training
  • Develop and maintain ETL processes to ensure efficient data flow from various sources into the data warehouse or deployed models.
  • Analyze complex datasets to identify trends, patterns, and insights that can improve model performance
  • Collaborate with SMEs and product managers in an Agile environment to define data requirements and deliver high-quality solutions.
  • Create documentation for data models, processes, and workflows to facilitate knowledge sharing within the team.
  • Monitor data quality and implement validation checks to ensure accuracy and reliability of datasets.

Experience

  • Proven experience in data engineering or a related field with a strong focus on text feature engineering.
  • Proficiency in programming languages such as Python and Java for data manipulation and analysis.
  • Familiarity with data warehousing concepts and technologies, including cloud deployment
  • Experience working with analytics tools to derive actionable insights from large datasets.
  • Knowledge of Agile methodologies and experience working in an Agile team environment is preferred.
  • Strong analytical skills with the ability to vaticinate future trends based on historical data analysis.

Desired Skills

  • Domain knowledge of the radiology and/or general medical industry including familiarity with privacy standards
  • Use of traditional NLP techniques (NLTK, SpaCy, BERT)
  • Experience in application and evaluation of LLMs (prompt engineering, RAG, quantization, vLLM)
  • Expertise in one or more of the following technologies:
  • Deep learning tools (e.g. TensorFlow, PyTorch)
  • Transformers (HuggingFace library, Unsloth, Axolotl)
  • MLOps and CI/CD techniques and tools (e.g. GitLab CI, MLflow, WandB)
  • Cloud technologies (e.g., AWS, GCP)
  • Data Engineering, Data Modeling, and ETL skills
  • Data Analytics expertise
  • Experience in implementing and optimizing data systems
  • Strong problem-solving and analytical skills

We are a hybrid team with a small office at WeWork for collaboration.

Share This Job: