Skip to content

I'm Nuthan, a
Software Engineer

Lead Data & Software Engineer | Data Governance | Distributed Systems

My Work

Building large-scale data platforms and governance systems across Azure, AWS, and GCP.

Browse Experience

Follow Me

/ About Me

I've been building
data systems since 2015

Currently at Microsoft as Senior Software Engineer, leading data governance and agentic systems at enterprise scale.

11+

Years of
experience

6+

Major
projects

33+

Technical
skills

4

Companies
worked at

/ Experience

Professional Journey

Senior Software Engineer

Microsoft

Mar 2022 — Present · Hyderabad

Projects (4)

Asset Data Catalog

  • Optimized Data Scan of 28B+ assets and 64 asset types for 2-20x improvement; reduced data freshness from 8 days to under 24 hours.
  • Reduced orphaned assets by identifying ownership based on new ownership metadata from 4% to <0.1%.
  • Established Quality Checks, Data Contracts across DataMap, DataGrid, Trust360 — halved live-site incidents, 80% faster TTR.
  • Eliminated non-deterministic code paths in data processing jobs, enhancing testability, reliability and consistency.
ScopeC#Azure Data FactoryData GovernanceData QualityData ContractsData LineagePII Detection

Signal Processing and Auditing

  • Led Privacy Signals Team — data deletion, export delivery, and compliance at enterprise scale.
  • Route Signals to the concerned team and audit the compliance of the actions taken and report the KPIs across the organization.
C#KubernetesDockerEvent HubData LakePower BIAzure Data ExplorerGDPR ComplianceMonitoring & Alerting

Trust360

  • Designed a generic low code Facet Processing Framework for different Privacy Programs. Facet Processing framework supports multiple sources and sinks that are pluggable and configurable.
  • Program owners can reuse and extend the Facet Processing framework to implement new privacy programs with minimal effort and reduced redundancy.
  • Designed the Trust360 data systems and provide a single source of truth for compliance reporting.
PythonApache SparkAzure Data FactorySynapseData LakeData GovernanceETL/ELT Pipelines

Investigation Assistant

  • Architected Agent based Investigation and Remediation framework for incidents using TSGs and mcp tools.
  • Added Tools for Azure Data Explorer and Synapse to enhance investigation and remediation capabilities.
  • These tools run in parallel and help save hours of time invested by developer in sequentially investigating and remediating incidents.
TypeScriptRAGMCPNode.jsPostgresAzure AI FoundryAgentic AILLM Orchestration

Software Engineer III

Walmart

Oct 2020 — Mar 2022 · Bangalore

Projects (1)

EBS Data Pipeline Framework

  • Built modular, config-driven data processing framework for Enterprise Business Services.
  • Optimized Spark data processing jobs, reducing execution time by ~30% and improving resource utilization.
  • Developed Go-based Code Freeze App integrating Slack and GitHub for deployment governance.
SparkScalaGCPGoSlackCI/CDETL/ELT Pipelines

Software Development Engineer I

Sigmoid

Jan 2019 — Oct 2020 · Bangalore

Projects (4)

Ad Pre-bid Pipeline

  • Developed streaming pipeline to support 100k req/sec, 1GB/sec (14k req/sec/core) and 10k concurrent users - presented at Apache Con NA 2020.
  • Optimized auto scaling to reduce costs without impacting performance.
  • Configured circuit breaker to redirect data to PubSub + Dataflow improving reliability when Kafka + Kafka Connect is down.
KafkaPubSubDataFlowJavaKubernetesDockerCircuit BreakerEvent-Driven ArchitectureMicroservices

NitroDB on BigQuery

  • Evaluated Percentile function in BigQuery and deployed percentile at scale using t-digest algorithm achieving 10-100x performance on pre-aggregation.
  • Evaluated, tested and deployed NitroDB on BigQuery reducing costs by 30%.
Apache SparkBigQueryNitroDBSQL

SigView Performance Optimization

  • Designed responsive search feature for multi-tenant attribute search with multi language support.
  • Implemented dynamic ranking of rollups to improve search latency.
  • Integrated Elasticsearch with NitroDB and created update/reindexing strategy.
ElasticsearchNitroDBMulti-TenancyAPI Design

Snowflake ETL

  • OSS committer on spark-snowflake connector — fixed critical data correctness bugs.
  • Built ETL pipeline and managed Snowflake costs, permissions, and GCS integration.
Apache SparkSnowflakePythonS3ETL/ELT Pipelines

Consultant P2

Capgemini

Jun 2015 — Jan 2019 · Bangalore

Projects (2)

Teradata to Hive Migration

  • Migrated complex business logic from Teradata to Hive for a leading energy client.
  • Optimized workloads using partitioning, ORC/Parquet file formats, and cost-based optimization.
  • Developed custom Java UDFs to handle complex transformations and close platform feature gaps.
HiveTeradataApache SparkHDFSJavaETL/ELT PipelinesSQL

Real-Time Sensor Analytics

  • Built real-time sensor analytics pipeline to predict oil well failures for an energy client.
  • Ingested high-velocity sensor data via NiFi into HBase for low-latency random reads.
  • Processed streaming data with Spark Streaming and visualized predictions using D3.js.
Spark StreamingNiFiHBaseD3.jsJavaStream Processing
/ Skills

Skills & Expertise

I do have a very particular set of skills, skills I have acquired over a very long career.

Skills Languages Python Java C# Go SQL Scala Data Platforms Data Modelling Databricks Snowflake Kafka Data Lake Apache Spark Architecture Event-Driven Architecture Microservices Kubernetes Docker API Design Design Patterns AI Agents LLM Orchestration Agentic AI RAG Azure AI Foundry Engineering Excellence Monitoring & Alerting System Design Telemetry CI/CD Code Quality Automated Testing Data Governance Data Governance PII Detection Data Lineage Data Quality GDPR Compliance
/ Certifications

Credentials & Certifications

Databricks Fundamentals
Databricks Academy 2026
Cloud Computing and Distributed Systems
2023
AWS Certified Solutions Architect – Associate
2022
/ Talks & Open Source

Community Contributions

🎤 Conference Talk

Planet-Scale Prebid Streaming Pipeline

ApacheCon North America 2020

Presented the architecture of a planet-scale streaming pipeline supporting 100k req/sec and 1GB/sec throughput for OpenX's Prebid system.

Kafka PubSub Kubernetes Java
🔧 Open Source

spark-snowflake Connector

snowflakedb/spark-snowflake

Contributed as a committer, fixing critical bugs involving data correctness in the official Spark-Snowflake connector.

Apache Spark Snowflake Scala
/ Education

My Education

Indian Institute of Technology (IIT) Kharagpur

Bachelor of Technology — Industrial & Systems Engineering

2011 — 2015

Sri Chaitanya Jr College, Gudavalli

XII — Board of Intermediate Education

2009 — 2011

Gowtham Concept School, Gudivada

X — Board of Secondary Education

2008 — 2009

/ Courses

Courses & Learning

Agentic AI with Azure AI Foundry

Microsoft Learn · 2025

Data Structures and Algorithms

IIT Kharagpur · 2015

Soft Computing and Heuristic Search

IIT Kharagpur · 2015

Java Memory Management

LinkedIn Learning

Google Cloud Platform Big Data and Machine Learning Fundamentals

Coursera

Apache Kafka Series - Kafka Cluster Setup & Administration

Udemy Academy

Feature Selection for Machine Learning

Udemy Academy

AWS Certified Developer - Associate

Udemy · 2018

AWS Certified Solutions Architect - Associate

Udemy · 2018

Python for Data Science and Machine Learning Bootcamp

Udemy Academy

Cloud Computing and Distributed Systems

IIT Patna