Harish Kesava Rao

I build the data infrastructure that powers AI applications in production — lakehouse architecture, embedding pipelines, vector storage, and large-scale Spark data platforms. Over 12 years at Atlassian, Databricks, Amazon, Salesforce, and Indeed, I have shipped systems handling petabyte-scale data across AWS and Azure. My work sits at the intersection of data engineering and AI/ML enablement — from high-throughput ingestion to RAG frameworks, semantic search, and LLM data pipelines. I am an active open-source contributor to Apache Airflow, Delta Lake, and DataHub, and I write about data engineering and AI infrastructure on this site.

Research interests. I am drawn to the open questions at the intersection of data engineering and machine learning: How do we design lakehouse storage formats and compaction strategies that remain efficient as embedding dimensions grow and vector indices must be refreshed at streaming latencies? What consistency and fault-tolerance guarantees does a distributed retrieval layer need to serve RAG pipelines reliably under production skew? How do scheduling and resource-allocation decisions in a Spark cluster change when the downstream consumer is an LLM endpoint? These questions sit at the intersection of systems research and applied ML infrastructure, and they motivate both my open-source work and the problems I choose to write about.

news

Oct 29, 2025
[Open Source] PR approved: Improved error observability in SnapshotManager.getLogSegmentForVersion — Delta Lake
Mar 29, 2025 [Talks] Guest lecture to Undergraduate students and faculty of an Engineering College’s Department of Artificial Intelligence and Data Science. Topic: Building a career in Data
Apr 29, 2024 [Update] Joined Atlassian India as Principal Data Engineer & Data Architect.
Apr 30, 2023 [Open Source] Created the Databricks Partition Sensor (for the Databricks Provider) for Apache Airflow.
Apr 2, 2023 [Open Source] First major contribution to Apache Airflow – Databricks SQL Sensor for Airflow. :sparkles:

latest posts