We are the only solution in the market today that offers Intelligent Remediation, which identifies which vulnerabilities to prioritize, assesses the impact of updates causing breaking changes, prioritizes what to fix first, securely builds open source packages from source, and facilitates the build and deploy process to get fixes into production quickly and easily.
All from the trusted partner that pioneered and continues to lead enterprise adoption and use of open source software.
This position is available to remote workers anywhere in North America
This position is open to experienced candidates with a track record in this area. We build our systems to find, understand, and secure all of the open source code on the planet, and we're looking for someone who knows how to find, ingest, model, and manage this kind of data to help us secure the Internet's source code!
Our audacious goal is to build all of the open source software released on the internet completely from source, in an automated and repeatable way. As a Senior Data Engineer, you will design, build, and maintain data pipelines that ensure reliable and accessible information across the company. You will create and improve data models, enhance dashboards, and work closely with teams to drive a data-first culture.
Our data is our most powerful, most important asset. Your work will influence our decisions and our success at every level.
This position is a mixture of development and operations; good coding and communication skills are as essential as data management and modeling expertise.
Key ResponsibilitiesDevelop and Maintain Scalable Data Infrastructure. You will design, build, and optimize data pipelines using tools like Airflow and Athena, ensuring efficient data ingestion, transformation, and analysis while maintaining integrity and accessibility.Enhance Data Models and Visualization. You will expand the company-wide data model to improve reporting accuracy and develop dashboards in Tableau and other tools to provide meaningful insights across departments.Ensure Data Quality and Documentation. You will implement monitoring systems to measure data quality, annotate ingestion feeds with accuracy metrics, and write documentation to help teams access and interpret data effectively.Drive a Data-First Mindset Across Teams. You will collaborate with sales, marketing, product management, and engineering to uncover insights, answer complex business questions, and advocate for best practices in data usage and decision-making.
We rely on data from the systems we've built as well as everything we can discover about open source software in the world.
Skills, Knowledge and Expertise
- At least five years of experience working with ETL processes, data pipelines, and data infrastructure, demonstrating your ability to manage complex data systems.
- Proficient in Python (preferred) or another programming language such as Go, Perl, Ruby, or Java, using code to automate and streamline data processes.
- Experience working with Athena, Snowflake, or similar data management platforms, allowing you to build and optimize effective data storage solutions.
- Skilled in visualization tools such as Tableau, Power BI, or Google Data Studio, ensuring that you can present data in a clear and meaningful way.
- Strong analytical and problem-solving skills, with experience working with structured and unstructured datasets to uncover insights and trends.
- Excellent communicator, both in writing and in speech, able to translate technical concepts into understandable information for non-technical audiences.
- Practical approach to data ownership, ensuring that stakeholders have the resources and guidance they need to access and interpret data effectively.
If you have experience with any of the following please make sure to highlight it in your cover letter:
- Build engineering, especially for languages such as Java, Go, and Rust
- Open Source projects and culture, especially dependency management and library maintenance
- Experience with open-source data processing tools such as Kafka, Hadoop, Airflow, PrestoDB, and similar technologies
- Background in cloud-based data engineering, particularly in AWS or GCP environments
- Previous experience working with sales or marketing analytics
- Contributions to open-source projects and communities
Benefits
- Working for a stable and growing company that offers the environment and personal growth potential of a start-up as well as the stability of a successful business with established revenue.
- The chance to grow and grow with a team, as we expand our data portfolio and team.
- The chance to collaborate with a smart, considerate, enthusiastic team of people.
- The chance to work on a project that will change the work lives of developers around the world, including your own!
- Competitive salary and bonus plan.
- Comprehensive benefits package and health/wellness credit program.
We're a polyglot company and embrace using the best language or tool for the given task at hand. We gladly use Python, Elm, JavaScript, Golang, Bazel, Docker, Kubernetes, Haskell, Airflow, and other modern tools. Quality is as important as speed. We're building for the long run, so you'll need to enjoy writing tests and documentation too.
We use open source software whenever possible, and we also like to contribute back to the open source ecosystem. We embrace open sourcing both libraries and tools developed in-house where that makes sense.
Our day to day work practices are centered around GitHub, pull requests, code review, continuous testing, integration, and deployment, and agile development. We coordinate with each other and the rest of the company using Slack for chat, Zoom for video calls and screen sharing, Jira, and Google Drive. We're always looking to improve our practices and we expect you to help us to do so.