SDR

Sai Dinesh Reddy Maluchuru

Data & ML Engineer with 8+ years building large-scale data platforms, ML pipelines, and cost-optimization solutions at Microsoft, Amazon, and JPMorgan Chase. Passionate about turning massive datasets into actionable insights and leveraging LLMs for evaluation frameworks, EDA automation, and prompt engineering. IIT Madras dual-degree graduate (AIR 728).

8+
Years Experience
3
Fortune 500 Cos
200+
Pipelines Built
50+
Engineers Mentored

Professional Experience

Microsoft Aug 2022 – Present
Data & ML Engineer 2
Hyderabad, India
  • ODSP Large-Scale Pipelines: Engineered high-throughput data pipelines on the OneDrive SharePoint team, processing 200 TB+ raw logs and 300 billion+ single-day Blob transactions; built optimized pipelines that reduced compute and storage consumption by 70%.
  • Blob Cost Analytics: Spearheaded Blob storage costing analysis — the single largest cost driver for ODSP — building Highcharts dashboards that surfaced granular cost breakdowns by service, region, and workload type, directly informing executive budget decisions and identifying $M-level savings opportunities.
  • Team Enablement & LLM Advocacy: Led organization-wide training on data optimization techniques and conducted hands-on sessions on EDA using Claude and GPT-4, upskilling 50+ engineers; championed LLM adoption for data exploration, query generation, and automated insight extraction across the ODSP team.
  • OneNote Copilot — Video Evaluation: Designed and built an LLM-as-a-Judge evaluation framework for video content analysis in OneNote Copilot Notebook, defining rubrics and scoring pipelines; collaborated closely with ML and Applied Science teams to deploy, test, and iterate on evaluation projects in production environments.
  • OneNote Web Adoption Dashboard: Architected a full-stack web adoption dashboard and metrics platform for OneNote, integrating telemetry from multiple sources; enabled leadership to track engagement funnels, cohort retention, and feature adoption — directly influencing product roadmap prioritization.
  • Data Engineering Leadership: Stood up and led the Data Engineering division for OneNote Copilot Notebook from scratch; secured and analyzed capacity tokens to justify and launch the DE team, establishing scalable infrastructure hosting 200+ pipelines and ML workflows with automated monitoring and alerting.
  • Windows Analytics & Passkey Adoption: Engineered a business-critical analytics system used by 28+ teams across Windows org, tracking reliability, monitoring, and feature adoption; drove cross-platform metrics for Remote Approval and Windows Passkey, identifying high-value user cohorts and boosting adoption by 30%.
  • Store Personalization & ML Optimization: Cut ML training costs by 80% via Resilience LPG framework optimizations; re-architected the training pipeline to improve model accuracy while reducing experimentation cycle time from days to hours.
Azure Data Factory Spark KQL / Kusto Power BI Highcharts LLM-as-a-Judge Cosmos DB Python
Amazon Aug 2019 – Aug 2022
Data Engineer
Hyderabad, India
  • Package Delivery Lifecycle Pipeline: Designed the end-to-end data pipeline modeling the complete lifecycle of a package — from order placement through hub processing to last-mile delivery — building three foundational datasets (Node Processing, Package Attributes, Package Items) at different granularities. These became the single source of truth for logistics analytics, used by operations, finance, and product teams worldwide.
  • Delivery Estimate Accuracy: Built the Delivery Estimate Accuracy framework for Indian deliveries, attributing false delivery promises across millions of daily packages; performed root-cause analysis to identify systemic patterns in estimate failures, directly improving promise accuracy and reducing customer-facing delivery misses.
  • Leave-at-Door Pilot (India): Partnered closely with engineering to instrument and analyze Amazon's "Leave Package at Door" pilot in India; built dashboards tracking adoption curves and delivery preferences (door vs. security desk vs. neighbor), generating insights that drove the nationwide rollout decision.
  • Fintech & Tax Compliance: Developed end-to-end data pipelines powering compliance and tax reporting across multiple regions, ensuring 99.9% data accuracy, full auditability, and regulatory adherence across complex financial workflows.
  • Platform Architecture & Orchestration: Designed scalable data ingestion solutions using Hoot, Redshift, and DJS/Airflow on AWS (ECS, Fargate, S3), supporting 80+ downstream consumer teams with sub-hour data freshness and automated retry/alerting mechanisms.
  • Global-Scale Optimization: Applied advanced optimization techniques (partition pruning, bucketing, broadcast joins, salting for skew) to scale pipelines for worldwide transaction volume; improved SLA adherence by 25% for mission-critical AMZL datasets while significantly reducing compute costs.
AWS (ECS, Fargate, S3) Redshift Hive / HiveQL Spark DJS / Airflow Python SQL
JPMorgan Chase Jul 2017 – Aug 2019
Associate
Hyderabad, India
  • Oracle-to-Spark Migration: Led the end-to-end migration of the entire Oracle data platform to Spark, re-architecting batch ETL jobs, resolving critical data skewness issues, and improving processing throughput by 3x while reducing infrastructure costs.
  • Data Ingestion & ETL Optimization: Engineered Sqoop-based ingestion pipelines and Hive batch jobs for large-scale data processing; applied Spark optimization techniques (repartitioning, salting, broadcast joins) to eliminate data skew, cutting job runtimes by 50% and reducing failure rates.
  • Loans, Cards & Merchant Analytics: Worked as a strategic analytics partner across Loans, Cards, and Chase Merchant Services business lines; developed a Cost-Based Allocation Model enabling precise P&L attribution, helping leadership understand profitability drivers across product segments.
  • Business Intelligence Dashboards: Built interactive Tableau dashboards for Mortgage Banking, Cards, and Merchant Services, serving 300+ daily users; enabled leadership to make data-driven decisions that reduced manual reporting effort by 40% and accelerated weekly business reviews.
  • Alteryx & ML Enablement: Spearheaded an Alteryx POC for automated data wrangling workflows, reducing data prep time by 60%; led team-wide learning sessions on Machine Learning fundamentals, demoing end-to-end supervised model training (Logistic Regression, Random Forest, XGBoost) with real business datasets.
Python SQL Spark Hive Sqoop Tableau Alteryx
General Electric May – Jul 2015
Summer Intern
Bangalore, India
  • Vehicle Fault Detection: Developed machine learning algorithms for detecting faults in vehicle engines using multi-sensor parameter data.
  • Data Mining: Implemented text mining techniques to filter noise and establish reliable ground truth labels for the fault detection system.
Machine Learning Signal Processing Python

Technical Skills

Languages & Query

SQL Python KQL Spark SQL HiveQL Bash R

Cloud & Big Data

Azure (Data Factory, Synapse, Cosmos DB, Kusto) AWS (ECS, Fargate, S3, Redshift) Spark Hadoop Hive Sqoop

AI & Machine Learning

LLMs (Claude, GPT-4) LLM-as-a-Judge Supervised & Unsupervised ML MLOps Prompt Engineering

Data & Visualization

Power BI Tableau Highcharts Alteryx DJS / Airflow Data Wrangling

Core Strengths

Petabyte-Scale Pipelines Cost Optimization Oracle-to-Spark Migration EDA Capacity Planning

Education

Dual Degree — B.Tech + M.Tech, Electrical Engineering
Indian Institute of Technology (IIT) Madras
2012 – 2017
5-year integrated program from one of India's premier engineering institutions. Coursework covered signal processing, machine learning, statistical methods, and advanced mathematics.

Achievements

IIT-JEE 2012: AIR 728 — Among 560,000+ candidates, top 0.13% nationally.
Technical Excellence — Recognized for outstanding engineering contributions at Microsoft; consistently rated "Exceeds Expectations."
M.Tech Thesis — Developed vehicle classification algorithms using image processing and ML, achieving 80% accuracy.
Publications & Blog — Authored technical articles on data engineering, Spark optimization, and ML best practices at dineshblog.com.
Mentorship — Actively mentor junior engineers on data engineering best practices, LLM adoption, and career development.

Get in Touch

Open to connecting with professionals passionate about data engineering, machine learning, and building scalable systems.

About This Blog

Digital Bits & Builds is my space for exploring ideas at the intersection of artificial intelligence, cloud engineering, and modern software development. I write about hands-on projects, deep technical dives, and lessons learned from building real-world applications.

Whether it's deploying ML models on Azure, experimenting with generative AI, or dissecting recommendation engines — every post is a reflection of curiosity-driven learning.

🧘