Harish Kesava Rao
I build the data infrastructure that powers AI applications in production — lakehouse architecture, embedding pipelines, vector storage, and large-scale Spark data platforms. Over 12 years at Atlassian, Databricks, Amazon, Salesforce, and Indeed, I have shipped systems handling petabyte-scale data across AWS and Azure. My work sits at the intersection of data engineering and AI/ML enablement — from high-throughput ingestion to RAG frameworks, semantic search, and LLM data pipelines. I am an active open-source contributor to Apache Airflow, Delta Lake, and DataHub, and I write about data engineering and AI infrastructure on this site.
news
| Oct 29, 2025 | [Open Source] PR approved: Improved error observability in SnapshotManager.getLogSegmentForVersion — Delta Lake. |
|---|---|
| Oct 29, 2025 | [Open Source] Submitted my first PR to Delta |
| Mar 29, 2025 | [Talks] Guest lecture to Undergraduate students and faculty of an Engineering College’s Department of Artificial Intelligence and Data Science. Topic: Building a career in Data |
| Apr 29, 2024 | [Update] Joined Atlassian India as Principal Data Engineer & Data Architect. |
| Apr 30, 2023 | [Open Source] Created the Databricks Partition Sensor (for the Databricks Provider) for Apache Airflow. |
latest posts
| Dec 29, 2025 | Deploying Data Science applications - from a Data Engineer's perspective |
|---|---|
| Mar 1, 2023 | Building a data lake on Microsoft Azure. |
| Jun 1, 2021 | Building a data lake on Amazon Web Services. |
| Nov 23, 2019 | Deploying on-premise big data pipelines. |