[Remote] Data Engineer
Note: The job is a remote job and is open to candidates in USA. You.com is building the AI Search Infrastructure that powers modern AI systems. They are seeking a hands-on Data Engineer to help build and scale their modern data platform, focusing on developing reliable, high-performance data pipelines and systems.
Responsibilities
- Build and maintain scalable data pipelines (batch and streaming) using tools like Databricks, Spark, Kafka, and AWS services
- Design, develop, and optimize ETL/ELT workflows using DBT, PySpark, SQL, and tools like Fivetran
- Partner closely with marketing and growth teams to enable data use cases such as segmentation, campaign targeting, and lifecycle analytics
- Develop and maintain reverse ETL pipelines to sync data from the warehouse to tools like Salesforce, HubSpot, Braze, and other downstream systems
- Create and manage curated datasets to support analytics, reporting, and go-to-market initiatives
- Build and maintain dashboards and reporting layers to support marketing and business performance tracking
- Support AI/ML and agent-based applications by preparing and serving high-quality datasets for RAG pipelines and MCP (Model Context Protocol) integrations
- Monitor pipeline performance, troubleshoot issues, and ensure high data reliability and quality
- Implement data quality checks, validations, and alerting mechanisms across both ingestion and activation layers
- Collaborate with cross-functional teams to define data contracts and ensure consistency across systems
Skills
- 6+ years of experience in data engineering or a related field
- Strong hands-on experience with Databricks, AWS (S3, Glue, Athena, EMR, etc.), and Kafka
- Proficiency in Python (PySpark) and SQL for large-scale data processing
- Experience building and maintaining ETL/ELT pipelines (DBT/Airflow or similar experience preferred)
- Experience with data ingestion tools such as Fivetran (or similar)
- Familiarity with reverse ETL / data activation workflows and syncing data to tools like Salesforce, HubSpot, Braze
- Exposure to or experience with AI/ML data pipelines, including RAG architectures, vector databases, or embeddings workflows
- Familiarity with agent-based systems, MCP integrations, or LLM-powered applications is a strong plus
- Experience working with marketing, Product or growth teams on data use cases (segmentation, attribution, campaign analytics, etc.)
- Understanding of data modeling and working with large-scale datasets (batch and streaming)
- Experience creating dashboards and supporting reporting workflows (BI tools) for both internal and external audiences
- Strong problem-solving skills and ability to debug production data issues
- Strong communication skills and ability to work collaboratively across teams
Benefits
- Hubs in San Francisco and New York City offering regular in-person gatherings and co-working sessions
- Flexible PTO with U.S. holidays observed and a week shutdown in December to rest and recharge*
- A competitive health insurance plan covers 100% of the policyholder and 75% for dependents*
- 12 weeks of paid parental leave in the US*
- 401k program, 3% match - vested immediately!*
- $500 work-from-home stipend to be used up to a year of your start date*
- $600 technology stipend to support a portion of our hybrid/remote team's cell phone and internet expenses*
- $1,200 per year Health & Wellness Allowance to support your personal goals*
- Certain perks and benefits are limited to full-time employees only
Company Overview
Company H1B Sponsorship
Apply To This Job