Senior Data Engineer
Requisition ID # 140681
Job Category: Information Technology
Job Level: Individual Contributor
Business Unit: Information Technology
Work Type: Hybrid
Job Location: Oakland
The Decision Products team strives to utilize best in class modeling techniques and industry leading data science to drive PG&E’s transition to the sustainable energy network of the future through data driven decision making. This work moves beyond descriptive reporting and is focused on pushing the business forward through applied statistics, predictive and prescriptive analytics, and insightful tool design. The cornerstone of these high value analytics is one of the largest smart meter usage databases in the industry, that when combined with billing, program engagement, customer demographic, grid, and other data sources has unprecedented potential.
Current and past projects include:
- Deployment of computer vision algorithms in tools that accelerate and automate asset inspections processes
- Predicting electric distribution equipment failure before it occurs allowing for proactive maintenance
- Optimizing renewable resource portfolios, including location and resource adequacy considerations
- Supporting asset strategy decision making including, where should PG&E underground electrical assets
- Supervised and unsupervised machine learning models using Python and Spark, trained on AWS, deployed on Palantir Foundry
We are looking for a savvy Data Engineer to join our growing team of analytics experts. In this role you will work as part of cross functional teams, including data scientists, other data engineers, technology experts, and subject matter experts to develop data driven solutions. Successful candidates will be responsible for building, expanding, and optimizing our data, data storage, and data pipeline. This individual will support team members (data scientists, software developers, etc.) and decision products to ensure that data delivery is reliable and optimized. They must be self-directed and comfortable supporting the data needs of multiple teams, systems, and products. facilitative leadership this role will help the team continue its history of success. Qualified candidates will have a unique opportunity to be at the forefront of the utility industry and gain a comprehensive view of the nation’s most advanced smart grid. It is the perfect role for someone who would like to continue to build upon their professional experience and help advance PG&E’s sustainability goals.
- Enhance and maintain our current data pipelines and associated infrastructure
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Engage with different stakeholder teams to troubleshoot various database systems
- Build and maintain tools that monitor data and system health
- Identify, design, and implement internal process improvements to optimize production of results and enable cost savings.
- Performance tune and optimize data pipeline on Spark
- Create and maintain documentation describing data catalog and data objects
- Bachelor’s degree in computer science, an engineering field, or equivalent work experience in an engineering field
- 5 years of experience with data engineering/ETL ecosystem, such as Palantir Foundry, Spark, Informatica, SAP BODS, OBIEE
- Scikit Learn
- PySpark or equivalent big data processing framework
- Mastery of database design fundamentals
- Familiarity with a CI/CD tool
- Familiarity with an infrastructure as code tool
- Experience writing production-level code
- Experience writing health checks, unit tests, integration tests, schema validations
- Knowledge of Time Series data set development.
- Demonstrated commitment to teamwork and enabling others
- Proven ability to translate business desires into technical requirements
- Ability to communicate with various stakeholders and leadership
- Ability to breakdown an ambiguous problems
- Familiarity with cloud computing security fundamentals
- Experience with the Palantir Foundry platform
- Experience working with data scientists and machine learning engineers
- Familiarity with model deployment
- Front end tools: PowerBi, Tableau