Harish Kesava Rao

I build the data infrastructure that powers AI applications in production — lakehouse architecture, embedding pipelines, vector storage, and large-scale Spark data platforms. Over 12 years at Atlassian, Databricks, Amazon, Salesforce, and Indeed, I have shipped systems handling petabyte-scale data across AWS and Azure. My work sits at the intersection of data engineering and AI/ML enablement — from high-throughput ingestion to RAG frameworks, semantic search, and LLM data pipelines. I am an active open-source contributor to Apache Airflow, Delta Lake, and DataHub, and I write about data engineering and AI infrastructure on this site.

news

Oct 29, 2025 [Open Source] PR approved: Improved error observability in SnapshotManager.getLogSegmentForVersionDelta Lake.
Oct 29, 2025 [Open Source] Submitted my first PR to Delta
Mar 29, 2025 [Talks] Guest lecture to Undergraduate students and faculty of an Engineering College’s Department of Artificial Intelligence and Data Science. Topic: Building a career in Data
Apr 29, 2024 [Update] Joined Atlassian India as Principal Data Engineer & Data Architect.
Apr 30, 2023 [Open Source] Created the Databricks Partition Sensor (for the Databricks Provider) for Apache Airflow.

latest posts