I'm Nuthan, a
Software Engineer
Lead Data & Software Engineer | Data Governance | Distributed Systems
My Work
Building large-scale data platforms and governance systems across Azure, AWS, and GCP.
Browse Experience
I've been building
data systems since 2015
Currently at Microsoft as Senior Software Engineer, leading data governance and agentic systems at enterprise scale.
Years of
experience
Major
projects
Technical
skills
Companies
worked at
Professional Journey
Senior Software Engineer
MicrosoftMar 2022 — Present · Hyderabad
Projects (4)
Asset Data Catalog
- Optimized Data Scan of 28B+ assets and 64 asset types for 2-20x improvement; reduced data freshness from 8 days to under 24 hours.
- Reduced orphaned assets by identifying ownership based on new ownership metadata from 4% to <0.1%.
- Established Quality Checks, Data Contracts across DataMap, DataGrid, Trust360 — halved live-site incidents, 80% faster TTR.
- Eliminated non-deterministic code paths in data processing jobs, enhancing testability, reliability and consistency.
Signal Processing and Auditing
- Led Privacy Signals Team — data deletion, export delivery, and compliance at enterprise scale.
- Route Signals to the concerned team and audit the compliance of the actions taken and report the KPIs across the organization.
Trust360
- Designed a generic low code Facet Processing Framework for different Privacy Programs. Facet Processing framework supports multiple sources and sinks that are pluggable and configurable.
- Program owners can reuse and extend the Facet Processing framework to implement new privacy programs with minimal effort and reduced redundancy.
- Designed the Trust360 data systems and provide a single source of truth for compliance reporting.
Investigation Assistant
- Architected Agent based Investigation and Remediation framework for incidents using TSGs and mcp tools.
- Added Tools for Azure Data Explorer and Synapse to enhance investigation and remediation capabilities.
- These tools run in parallel and help save hours of time invested by developer in sequentially investigating and remediating incidents.
Software Engineer III
WalmartOct 2020 — Mar 2022 · Bangalore
Projects (1)
EBS Data Pipeline Framework
- Built modular, config-driven data processing framework for Enterprise Business Services.
- Optimized Spark data processing jobs, reducing execution time by ~30% and improving resource utilization.
- Developed Go-based Code Freeze App integrating Slack and GitHub for deployment governance.
Software Development Engineer I
SigmoidJan 2019 — Oct 2020 · Bangalore
Projects (4)
Ad Pre-bid Pipeline
- Developed streaming pipeline to support 100k req/sec, 1GB/sec (14k req/sec/core) and 10k concurrent users - presented at Apache Con NA 2020.
- Optimized auto scaling to reduce costs without impacting performance.
- Configured circuit breaker to redirect data to PubSub + Dataflow improving reliability when Kafka + Kafka Connect is down.
NitroDB on BigQuery
- Evaluated Percentile function in BigQuery and deployed percentile at scale using t-digest algorithm achieving 10-100x performance on pre-aggregation.
- Evaluated, tested and deployed NitroDB on BigQuery reducing costs by 30%.
SigView Performance Optimization
- Designed responsive search feature for multi-tenant attribute search with multi language support.
- Implemented dynamic ranking of rollups to improve search latency.
- Integrated Elasticsearch with NitroDB and created update/reindexing strategy.
Consultant P2
CapgeminiJun 2015 — Jan 2019 · Bangalore
Projects (2)
Teradata to Hive Migration
- Migrated complex business logic from Teradata to Hive for a leading energy client.
- Optimized workloads using partitioning, ORC/Parquet file formats, and cost-based optimization.
- Developed custom Java UDFs to handle complex transformations and close platform feature gaps.
Real-Time Sensor Analytics
- Built real-time sensor analytics pipeline to predict oil well failures for an energy client.
- Ingested high-velocity sensor data via NiFi into HBase for low-latency random reads.
- Processed streaming data with Spark Streaming and visualized predictions using D3.js.
Skills & Expertise
I do have a very particular set of skills, skills I have acquired over a very long career.
Credentials & Certifications
Community Contributions
Planet-Scale Prebid Streaming Pipeline
ApacheCon North America 2020
Presented the architecture of a planet-scale streaming pipeline supporting 100k req/sec and 1GB/sec throughput for OpenX's Prebid system.
spark-snowflake Connector
snowflakedb/spark-snowflakeContributed as a committer, fixing critical bugs involving data correctness in the official Spark-Snowflake connector.
My Education
Indian Institute of Technology (IIT) Kharagpur
Bachelor of Technology — Industrial & Systems Engineering
2011 — 2015
Sri Chaitanya Jr College, Gudavalli
XII — Board of Intermediate Education
2009 — 2011
Gowtham Concept School, Gudivada
X — Board of Secondary Education
2008 — 2009
Courses & Learning
Agentic AI with Azure AI Foundry
Data Structures and Algorithms
Soft Computing and Heuristic Search
Java Memory Management
Google Cloud Platform Big Data and Machine Learning Fundamentals
Apache Kafka Series - Kafka Cluster Setup & Administration
Feature Selection for Machine Learning
AWS Certified Developer - Associate
AWS Certified Solutions Architect - Associate
Python for Data Science and Machine Learning Bootcamp
Cloud Computing and Distributed Systems
Thoughts & Writing
Microservices — Is it worth the hype and how can we benefit from it?
Medium
LSM Trees
Notion
Isolation Levels in Databases
Notion
Big Brain — Mental Models for Engineers
Notion