JOB SUMMARY
Supports and ensures the reliability, performance, and integrity of enterprise data platforms across cloud (Azure Databricks, ADLS, Power BI) and on-premises systems (Oracle, Informatica, SAS, Cognos). Works as part of the Data Operations team to monitor data pipelines, troubleshoot incidents, support migrations and releases, and maintain platform stability.
Progressively assumes responsibility for operational ownership, automation, observability, governance, optimization, and reliability engineering practices to support business-critical data workloads.
Job Duties
Level 2 & 3
- Owns day-to-day operations of assigned data platforms or pipeline domains.
- Investigates and resolves moderately complex incidents with minimal supervision.
- Performs root cause analysis for recurring issues and documents corrective actions.
- Coordinates with infrastructure, security, and engineering teams during deployments and incident resolution.
- Supports migration activities, platform upgrades, and release validation.
- Implements monitoring enhancements and automated health checks.
- Participates in disaster recovery testing and data validation exercises.
- Optimizes SQL queries and pipeline performance under guidance.
- Assists in implementing cost-monitoring practices for cloud workloads.
- Communicates operational status and risks to stakeholders.
- Prioritizes workload to meet team service-level objectives.
Level - Sr. (in additional to Level 2 &3)
- Ensures end-to-end reliability of data platforms, including cloud (Azure Databricks, ADLS) and on-premises systems (Oracle, Informatica, SAS, Cognos).
- Leads incident response and root-cause analysis for critical data operations issues, driving permanent fixes.
- Architects monitoring and observability frameworks for data pipelines, BI refreshes, and platform health, leveraging golden signals and automated checks.
- Designs and implements automation and infrastructure-as-code solutions to streamline operational workflows and environment provisioning.
- Governs data migrations, releases, and upgrades, ensuring integrity, rollback strategies, and compliance with change management standards.
- Optimizes data pipelines, queries, and storage strategies for performance, scalability, and cost efficiency across hybrid environments.
- Develops and validates disaster recovery and business continuity plans for data platforms, meeting SLA objectives.
- Drives FinOps practices and capacity planning for data workloads, enforcing cost guardrails and resource optimization.
- Implements security and compliance controls for data operations, ensuring audit readiness and adherence to regulatory requirements.
- Serves as Subject Matter Expert (SME) for data reliability and operational excellence.
- Mentors analysts and curates knowledge assets, fostering best practices in governance and reliability.
- Conveys team strategy and operational goals through strong written and verbal communication.
QUALIFICATIONS
Education
- Bachelor's degree in computer science, Information Systems, or related field OR equivalent combination.
Required Experience
Level II
- 2+ years of experience in data/platform operations or technical support.
Level III
- 4+ years of experience in enterprise data/platform operations.
- Proven experience with hybrid cloud/on-prem data ecosystems.
Level Senior
- 5+ years of experience in data/platform operations or site reliability with enterprise scope.
- Proven track record leading complex operational initiatives and incident responses.
- Extensive experience with enterprise-scale cloud and on-premises data platforms.
Electric Reliability Council of Texas
Texas United States
www.ercot.com


