JOB SUMMARY
Leads the team responsible for developing, deploying, and operating machine learning models, generative AI applications, autonomous agents, and related AI solutions across ERCOT's enterprise platforms. Oversees MLOps standards, production support, platform reliability, and governance for ML and GenAI assets. Balances delivery of new AI capabilities with operational excellence and ensures compliance with AI governance and model lifecycle controls. Partners closely with Data Operations, Data Engineering, Governance, Security, and business stakeholders to ensure safe, reliable, and efficient AI systems.
JOB DUTIES
- Responsible for hiring, coaching, training, and performance management of staff.
- Frequently interacts with reporting supervisors, customers, and/or functional peer group managers, normally involving matters between functional areas or customers.
- Responsible for the management of subordinate staff within a department. Typically has individual contributors as direct reports, but could have supervisory direct reports. Has full responsibility for direct reports.
- Generally provides input to budgeting and financial decisions that impact the department. Requests approval for financial actions beyond a limited scope.
ADDITIONAL JOB DUTIES
- Oversee end-to-end delivery of AI/ML and GenAI solutions, from design through deployment, ensuring enterprise-ready quality, reliability, and security.
- Set technical direction and architectural standards for ML models, GenAI applications, autonomous agents, RAG systems, multimodal solutions, and vector/semantic search capabilities.
- Own and govern MLOps standards, including CI/CD automation, deployment pipelines, monitoring, evaluation frameworks, and model lifecycle controls for both ML and GenAI assets.
- Lead and develop the AI & ML Engineering team, including hiring, onboarding, coaching, performance management, and establishing clear skill ladders and growth pathways.
- Manage production ML/GenAI operations and Level 3 support, leading root-cause investigations, incident command, post-incident reviews, and long-term problem management.
- Ensure compliance with ERCOT model governance and GenAI-specific controls, including risk tiering, documentation, lineage, prompt management, safety guardrails, and regulatory requirements.
- Guide platform engineering for AI/ML infrastructure, including Azure ML, Databricks ML, vector databases, LLM orchestration frameworks, and ML/GenAI observability tooling.
- Plan and prioritize intake, releases, and roadmaps for ML and GenAI initiatives in partnership with Product Owners and Data Operations leadership.
- Oversee vendor and contractor contributions to ensure quality, maintain architectural integrity, and achieve knowledge transfer into ERCOT's internal teams.
- Collaborate across Data Engineering, Architecture, Governance, Security, and business stakeholders to align AI/ML solutions with enterprise needs and regulatory responsibilities.
- Review and approve high-risk deployments and exceptions, ensuring compensating controls are in place for ML and GenAI systems.
- Establish and track performance, reliability, and cost metrics for ML infrastructure, LLM usage, GenAI applications, and overall MLOps health.
- Communicate operational status, risks, and trade-offs to executive stakeholders and technical partners with clarity and accountability.
EXPERIENCE
- 8+ years in ML operations, MLOps engineering, AI/ML development, data engineering, or software engineering with ML/AI focus
- 2+ years leading ML operations teams or technical teams in ML/AI environments
- Demonstrated experience with enterprise-scale ML deployment, operations, and GenAI application development
ADDITIONAL QUALIFICATIONS
- Experience building and deploying ML/GenAI solutions using platforms like Azure ML, Azure AI, Databricks ML, and Azure OpenAI.
- Strong background in LLMs, RAG/semantic search, and AI agent or multi-agent architectures.
- Proven MLOps expertise, including CI/CD for ML, model serving, monitoring, and production support.
- Leadership experience guiding technical teams and aligning engineering work with business and governance needs.
- Proficiency in Python and modern data/AI engineering practices, with familiarity in cloud infrastructure, vector databases, or AI observability tools.
EDUCATION
- Bachelor's Degree: Computer Science, Data Science, or related filed (Required)
- Master's Degree: Computer Science, Data Science, or related filed (Preferred)
- or a combination of education and experience that provides equivalent knowledge to a major in such fields is required
Electric Reliability Council of Texas
Texas United States
www.ercot.com


