Experienced LLM Data Engineer for Generative AI Platform - Fully Remote Opportunity in the United States

Posted 2025-10-26
Remote, USA Full Time Immediate Start
Unlock Your Potential as a Pioneering LLM Data Engineer Join our innovative team as a fully remote LLM Data Engineer and be at the forefront of shaping the future of Generative AI. We're seeking a highly skilled and passionate data engineering professional to design, develop, and maintain the data pipeline for our cutting-edge Generative AI platform. As an LLM Data Engineer, you will play a crucial role in driving the success of our AI initiatives and contribute to the development of groundbreaking AI applications. About Our Company and the Role Our company is a leader in the AI industry, and we're committed to harnessing the power of Large Language Models (LLMs) to revolutionize various sectors. As part of our AI Center of Excellence (COE) within the DX Tech & Digital division, you will report to the Director, AI Solutions & Development, and collaborate with cross-functional teams to deliver high-quality AI solutions. This is a unique opportunity to work on highly visible strategic projects and be an integral part of our AI innovation journey. Key Responsibilities Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes. Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform, ensuring data quality and relevance. Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data. Benchmark and implement various vector stores, embedding techniques, and retrieval methods to enhance the performance of our AI models. Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search). Implement and maintain auto-tagging systems and data preparation processes for LLMs, ensuring efficient data management. Develop tools for text and image data crawling, cleaning, and refinement, adhering to data privacy and security best practices. Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models, driving the success of our AI initiatives. Work with data lake house architectures to optimize data storage and processing, leveraging technologies like Snowflake and various vector store technologies. Integrate and optimize workflows using Snowflake and other relevant technologies, ensuring seamless data processing and analysis. Essential Qualifications and Skills To excel in this role, you should possess: A Master's degree in Computer Science, Data Science, or a related field, demonstrating a strong foundation in data engineering and AI. 3-5 years of work experience in data engineering, preferably in AI/ML contexts, with a proven track record of delivering innovative AI applications. Proficiency in Python, JSON, HTTP, and related tools, with a strong understanding of LLM architectures, training processes, and data requirements. Experience with RAG systems, knowledge base construction, and vector databases, as well as familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts. Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated), and knowledge of data crawling techniques and associated ethical considerations. Strong problem-solving skills and ability to work in a fast-paced, innovative environment, with excellent communication, collaboration, and problem-solving skills. Ability to translate business needs into technical solutions, with a passion for innovation and a commitment to ethical AI development. Experience building LLMs pipeline using frameworks like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions, and familiarity with different LLM parameters and outcome evaluation data science metrics and methodologies. Preferred Skills and Qualifications To further enhance your candidacy, you may possess: Experience with popular LLM/RAG frameworks, demonstrating a deep understanding of the latest advancements in AI. Familiarity with distributed computing platforms (e.g., Apache Spark, Dask), and knowledge of data versioning and experiment tracking tools. Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processing, and understanding of data privacy and security best practices. Practical experience implementing data lakehouse solutions, and proficiency in optimizing queries and data processes in Snowflake or Databricks. Hands-on experience with different vector store technologies, and a strong understanding of their applications in AI. Career Growth Opportunities and Learning Benefits As an LLM Data Engineer at our company, you will have the opportunity to: Work on cutting-edge AI projects, driving innovation and growth in the industry. Collaborate with cross-functional teams, developing a deep understanding of the AI ecosystem and enhancing your communication and collaboration skills. Develop and maintain a robust data pipeline, honing your skills in data engineering and AI. Stay up-to-date with the latest advancements in LLM technologies, RAG systems, and vector databases, expanding your knowledge and expertise. Work Environment and Company Culture Our company values a culture of innovation, collaboration, and continuous learning. As a fully remote team member, you will enjoy: Flexible working hours and a remote work setup, allowing you to work from anywhere in the United States. A collaborative and dynamic work environment, with regular virtual team meetings and knowledge-sharing sessions. Opportunities for professional growth and development, with access to training and mentorship programs. Compensation, Perks, and Benefits We offer a competitive salary and a comprehensive benefits package for US employees, including: A dependable salary, reflecting your skills and experience. A range of benefits, including health insurance, retirement plans, and paid time off. Opportunities for career growth and professional development, with a commitment to supporting your long-term goals. Join Our Team and Shape the Future of AI If you're a passionate and experienced LLM Data Engineer looking to drive innovation and growth in the AI industry, we encourage you to apply for this exciting opportunity. With your skills and expertise, you will play a crucial role in shaping the future of our Generative AI platform and contributing to the development of groundbreaking AI applications. Don't miss this chance to join our team and be part of a dynamic and forward-thinking organization. Apply for this job
Back to Job Board