I'm Nuthan, a
Software Engineer

Lead Data & Software Engineer | Data Governance | Distributed Systems

My Work

Building large-scale data platforms and governance systems across Azure, AWS, and GCP.

Follow Me

/ About Me

I've been building
data systems since 2015

Currently at Microsoft as Senior Software Engineer, leading data governance and agentic systems at enterprise scale.

11+

Years of
experience

Major
projects

33+

Technical
skills

Companies
worked at

/ Experience

Professional Journey

Senior Software Engineer

Microsoft

Mar 2022 — Present · Hyderabad

Projects (4)

Asset Data Catalog

Optimized Data Scan of 28B+ assets and 64 asset types for 2-20x improvement; reduced data freshness from 8 days to under 24 hours.
Reduced orphaned assets by identifying ownership based on new ownership metadata from 4% to <0.1%.
Established Quality Checks, Data Contracts across DataMap, DataGrid, Trust360 — halved live-site incidents, 80% faster TTR.
Eliminated non-deterministic code paths in data processing jobs, enhancing testability, reliability and consistency.

ScopeC#Azure Data FactoryData GovernanceData QualityData ContractsData LineagePII Detection

Signal Processing and Auditing

Led Privacy Signals Team — data deletion, export delivery, and compliance at enterprise scale.
Route Signals to the concerned team and audit the compliance of the actions taken and report the KPIs across the organization.

C#KubernetesDockerEvent HubData LakePower BIAzure Data ExplorerGDPR ComplianceMonitoring & Alerting

Trust360

Designed a generic low code Facet Processing Framework for different Privacy Programs. Facet Processing framework supports multiple sources and sinks that are pluggable and configurable.
Program owners can reuse and extend the Facet Processing framework to implement new privacy programs with minimal effort and reduced redundancy.
Designed the Trust360 data systems and provide a single source of truth for compliance reporting.

PythonApache SparkAzure Data FactorySynapseData LakeData GovernanceETL/ELT Pipelines

Investigation Assistant

Architected Agent based Investigation and Remediation framework for incidents using TSGs and mcp tools.
Added Tools for Azure Data Explorer and Synapse to enhance investigation and remediation capabilities.
These tools run in parallel and help save hours of time invested by developer in sequentially investigating and remediating incidents.

TypeScriptRAGMCPNode.jsPostgresAzure AI FoundryAgentic AILLM Orchestration

Software Engineer III

Walmart

Oct 2020 — Mar 2022 · Bangalore

Projects (1)

EBS Data Pipeline Framework

Built modular, config-driven data processing framework for Enterprise Business Services.
Optimized Spark data processing jobs, reducing execution time by ~30% and improving resource utilization.
Developed Go-based Code Freeze App integrating Slack and GitHub for deployment governance.

SparkScalaGCPGoSlackCI/CDETL/ELT Pipelines

Software Development Engineer I

Sigmoid

Jan 2019 — Oct 2020 · Bangalore

Projects (4)

Ad Pre-bid Pipeline

Developed streaming pipeline to support 100k req/sec, 1GB/sec (14k req/sec/core) and 10k concurrent users - presented at Apache Con NA 2020.
Optimized auto scaling to reduce costs without impacting performance.
Configured circuit breaker to redirect data to PubSub + Dataflow improving reliability when Kafka + Kafka Connect is down.

KafkaPubSubDataFlowJavaKubernetesDockerCircuit BreakerEvent-Driven ArchitectureMicroservices

NitroDB on BigQuery

Evaluated Percentile function in BigQuery and deployed percentile at scale using t-digest algorithm achieving 10-100x performance on pre-aggregation.
Evaluated, tested and deployed NitroDB on BigQuery reducing costs by 30%.

Apache SparkBigQueryNitroDBSQL

SigView Performance Optimization

Designed responsive search feature for multi-tenant attribute search with multi language support.
Implemented dynamic ranking of rollups to improve search latency.
Integrated Elasticsearch with NitroDB and created update/reindexing strategy.

ElasticsearchNitroDBMulti-TenancyAPI Design

Snowflake ETL

OSS committer on spark-snowflake connector — fixed critical data correctness bugs.
Built ETL pipeline and managed Snowflake costs, permissions, and GCS integration.

Apache SparkSnowflakePythonS3ETL/ELT Pipelines

Consultant P2

Capgemini

Jun 2015 — Jan 2019 · Bangalore

Projects (2)

Teradata to Hive Migration

Migrated complex business logic from Teradata to Hive for a leading energy client.
Optimized workloads using partitioning, ORC/Parquet file formats, and cost-based optimization.
Developed custom Java UDFs to handle complex transformations and close platform feature gaps.

HiveTeradataApache SparkHDFSJavaETL/ELT PipelinesSQL

Real-Time Sensor Analytics

Built real-time sensor analytics pipeline to predict oil well failures for an energy client.
Ingested high-velocity sensor data via NiFi into HBase for low-latency random reads.
Processed streaming data with Spark Streaming and visualized predictions using D3.js.

Spark StreamingNiFiHBaseD3.jsJavaStream Processing

/ Skills

Skills & Expertise

I do have a very particular set of skills, skills I have acquired over a very long career.

/ Certifications

Credentials & Certifications

✦ Databricks Fundamentals

Databricks Academy 2026

✦ Cloud Computing and Distributed Systems

2023

✦ AWS Certified Solutions Architect – Associate

2022

/ Talks & Open Source

Community Contributions

🎤 Conference Talk

Planet-Scale Prebid Streaming Pipeline

ApacheCon North America 2020

Presented the architecture of a planet-scale streaming pipeline supporting 100k req/sec and 1GB/sec throughput for OpenX's Prebid system.

Kafka PubSub Kubernetes Java

🔧 Open Source

spark-snowflake Connector

snowflakedb/spark-snowflake

Contributed as a committer, fixing critical bugs involving data correctness in the official Spark-Snowflake connector.

Apache Spark Snowflake Scala

/ Education

My Education

Indian Institute of Technology (IIT) Kharagpur

Bachelor of Technology — Industrial & Systems Engineering

2011 — 2015

Sri Chaitanya Jr College, Gudavalli

XII — Board of Intermediate Education

2009 — 2011

Gowtham Concept School, Gudivada

X — Board of Secondary Education

2008 — 2009

/ Courses

Courses & Learning

Agentic AI with Azure AI Foundry

Microsoft Learn · 2025

Data Structures and Algorithms

IIT Kharagpur · 2015

Soft Computing and Heuristic Search

IIT Kharagpur · 2015

Java Memory Management

LinkedIn Learning

Google Cloud Platform Big Data and Machine Learning Fundamentals

Coursera

Apache Kafka Series - Kafka Cluster Setup & Administration

Udemy Academy

Feature Selection for Machine Learning

Udemy Academy

AWS Certified Developer - Associate

Udemy · 2018

AWS Certified Solutions Architect - Associate

Udemy · 2018

Python for Data Science and Machine Learning Bootcamp

Udemy Academy

Cloud Computing and Distributed Systems

IIT Patna

/ Blog

Thoughts & Writing

Microservices — Is it worth the hype and how can we benefit from it?

Medium

MicroservicesArchitectureDistributed Systems

LSM Trees

Notion

Data StructuresDatabasesStorage Engines

Isolation Levels in Databases

Notion

DatabasesRDBMSDistributed Systems

Big Brain — Mental Models for Engineers

Notion

System DesignEngineering Excellence

I'm Nuthan, a Software Engineer

My Work

Follow Me

I've been buildingdata systems since 2015

Professional Journey

Senior Software Engineer

Asset Data Catalog

Signal Processing and Auditing

Trust360

Investigation Assistant

Software Engineer III

EBS Data Pipeline Framework

Software Development Engineer I

Ad Pre-bid Pipeline

NitroDB on BigQuery

SigView Performance Optimization

Snowflake ETL

Consultant P2

Teradata to Hive Migration

Real-Time Sensor Analytics

Skills & Expertise

Credentials & Certifications

Community Contributions

Planet-Scale Prebid Streaming Pipeline

spark-snowflake Connector

My Education

Indian Institute of Technology (IIT) Kharagpur

Sri Chaitanya Jr College, Gudavalli

Gowtham Concept School, Gudivada

Courses & Learning

Thoughts & Writing

Microservices — Is it worth the hype and how can we benefit from it?

LSM Trees

Isolation Levels in Databases

Big Brain — Mental Models for Engineers

I'm Nuthan, a
Software Engineer

I've been building
data systems since 2015