Job Description:
• Design and build data-driven tools that operate on large datasets stored in S3 and Snowflake
• Implement pipelines that:
• Extract specific columns or datasets from Snowflake
• Generate vector embeddings via APIs such as OpenAI
• Store and manage embeddings in vector databases like Pinecone
• Enable semantic search and similarity-based retrieval
• Develop enrichment workflows that:
• Query structured data
• Use LLM APIs to generate new derived columns
• Write enriched results back into Snowflake
• Build reusable internal services and SDKs around embedding generation, prompt orchestration, and data augmentation
• Optimize performance and cost across AWS infrastructure
• Work closely with product and data teams to turn use cases into scalable engineering solutions
• Ensure reliability, observability, and maintainability of AI-powered pipelines
Requirements:
• 5+ years of software engineering experience
• Strong backend engineering skills (Python preferred; other modern languages acceptable)
• Solid experience with:
• AWS (IAM, Lambda, ECS/EKS, S3, networking, security best practices)
• Data warehousing (Snowflake preferred)
• API design and distributed systems
• Hands-on experience working with LLM APIs (e.g., OpenAI) and embedding workflows
• Experience with vector databases (Pinecone or similar)
• Strong understanding of data modeling, ETL/ELT patterns, and performance optimization
• Production experience in at least one startup environment
• Ability to operate independently and ship high-impact systems end-to-end
Benefits:
• Work on practical, production-grade AI systems
• Direct impact on how data is leveraged across the company
• Startup speed with real ownership and autonomy
• Opportunity to define the internal AI platform from the ground up