cv

General Information

Full Name Harish Kesava Rao
Languages English

Education

  • 2011
    Master of Science
    University of Arizona, Tucson, AZ, USA
    • Major in Management Information Systems.
    • Nationally, number 1 public graduate information systems program.
    • Courses
      • Enterprise Data Management
      • Business Intelligence
      • Business Communication
      • Web Mining and Analytics
      • Data Mining
      • Software Design Patterns
      • Operations Management
  • 2007
    Bachelor of Technology
    Anna University
    • Major in Information Technology.

Experience

  • 2024 - Present
    Principal Data Engineer
    Atlassian
    • Planning the short-term and long-term technology roadmap for Data Engineering projects.
    • Guiding Data Engineers and Lead Data Engineers on design, data architecture for mutliple streams.
    • Resolving ambiguity and arriving at clear, actionable decisions; Help achieve trade-offs between velocity and quality.
    • Providing constructive and clear feedback during code reviews and design reviews.
    • Helping the team succeed in building robust, scalable, auditable pipelines to create a performant Data Lake on AWS (ECS, S3, Airflow), Airflow and Databricks.
    • Key areas/skills.
      • Databricks
      • Delta lake storage
      • AWS - S3, SQS, SNS, Kinesis
      • Spark - Batch, performance tuning
      • DBT, Jinja templates
  • 2022 - 2024
    Staff Software Engineer/Team Lead, Data Engineering
    Databricks
    • Managing multiple large-scale Data Engineering initiatives. Mentoring and advising Data Engineers.
    • Deploying data pipelines and associated resources on AWS, Azure via Terraform (HCL) on Databricks workspaces.
    • Creating spark ingestion notebooks, tuning streaming and batch spark jobs and clusters on Azure and AWS.
    • Ingesting data from REST APIs and storing them on AWS S3 via standard Python frameworks.
    • Key areas/skills.
      • Databricks
      • Delta lake
      • AWS - S3, SQS, SNS, Kinesis, CodeBuild
      • Azure - Storage Blob, Eventgrid, Eventhubs
      • Spark - Streaming, batch, performance tuning
      • Terraform - resource management automation for Databricks, AWS and Azure resources.
      • Parquet, JSON file management
      • Hive metastore
  • 2021 - 2022
    Senior Data Engineer
    Salesforce
    • Augment Tableau's license lifecycle analysis with AWS compute and storage alongside Snowflake.
    • Key areas/skills.
      • AWS - S3, EMR, Pyspark.
      • Snowflake
      • Tableau integration with Python.
  • 2020 - 2021
    Senior Data Engineer
    Amazon Prime Video
    • First Data Engineer for Prime Video Search.
    • Designed and implemented a Data Lake for Prime Video Search using EMR, Spark, Scala, S3, Athena, Tableau, SageMaker.
    • Key areas/skills.
      • AWS - EMR, S3, SageMaker, Athena, Pyspark.
  • 2017 - 2020
    Senior Data Engineer
    Indeed
    • Designed, standardized and automated DW/data pipelines using Postgres, Hive, Hadoop, Snowflake and Airflow.
    • Key areas/skills.
      • Python
      • Postgres
      • Pyspark
      • Hive
      • Docker
      • Airflow
  • 2014 - 2017
    Senior ETL Engineer
    Informatica
    • Developed and deployed ETL pipelines, data warehouses in Oracle, MySQL, MS SQL Server, Netezza, Teradata using Informatica.
    • Used Python to implement pipelines to consume raw/unstructured data.
    • Key areas/skills.
      • Informatica PowerCenter, Data Quality, Metadata Manager, Data Replication, Big Data Edition, Cloud Edition.
      • Python
  • 2012 - 2013
    Presales Technical Consultant
    Informatica
    • Product demos for prospects, technical Proof Of Concept engagements.
    • Key areas/skills.
      • Informatica PowerCenter.

Open Source Projects

  • 2021 - now
    Contributions to Apache Airflow
    • Contributions to various providers in Airflow.