About
With over 15 years of experience delivering large-scale data and AI solutions across multiple industries and continents, at work I help companies architect, build, and modernise the data infrastructure that powers their decisions and AI ambitions. Currently based in Berlin, Germany, my focus is on robust ETL pipelines, data lakehouses, and cloud-native systems that scale.
At work, I bring both strategic clarity and hands-on execution — from migrating 100+ production pipelines to modern lakehouse architecture, to implementing monitoring, fault handling, and data quality governance. My career spans enterprise clients across retail, finance, healthcare, and telecommunications, delivering solutions on AWS, GCP, and MS Azure at petabyte scale.
I'm also deepening my AI/ML practice through Georgia Tech's Machine Learning program and hands-on work with LLMs, deep learning, and agentic systems — with a focus on the infrastructure that makes AI reliable in production. I work as a consultant and am open to select full-time roles where I can drive meaningful impact.
Skills & Technologies
Big Data & Databases
- Apache Spark
- Databricks
- AWS Redshift
- PostgreSQL
- Oracle PL/SQL
- DynamoDB
- Google BigQuery
- Elastic Stack
Cloud & Orchestration
- AWS (S3, Lambda, EC2)
- Google Cloud Platform
- Microsoft Azure
- Apache Airflow
- MLflow
- Docker & Kubernetes
AI/ML & Programming
- AWS SageMaker & Bedrock
- LLMs & Agentic Engineering
- Deep Learning (RNN, CNN)
- PyTorch & Scikit-Learn
- Python
- Go
- FastAPI
DevOps & Monitoring
- GitLab CI/CD
- DataDog
- Grafana
- Bugsnag
- Sentry
Featured Projects
Data Lakehouse Migration — Retail / E-commerce
Led the migration of 100+ production ETL pipelines from legacy Oracle and Exasol systems to a modern data lakehouse architecture on AWS and Databricks. Introduced right-sized cluster management — replacing a one-size-fits-all approach — achieving up to 35% faster pipeline execution and direct reductions in cloud infrastructure costs. Implemented comprehensive monitoring, fault handling, and data quality governance throughout.
Unified AI & Data Infrastructure — Multinational Consumer Goods
Proposed and architected a unified cloud infrastructure on Azure and Databricks to consolidate data processing and AI model development across a multinational consumer goods company. The solution was adopted globally, with the company's engineering teams worldwide implementing it across their respective tech departments. Collaborated with data scientists to deploy and productionise demand forecasting models, making them reliable and scalable in production.
Real-Time Event Streaming & Fraud Detection — Telecommunications
Built a near real-time data streaming pipeline processing events emitted by mobile devices, enabling instant monitoring of activity by demographics and region using attributes encoded in each event. The processed stream fed directly into an AI clustering model to detect and track suspicious activity patterns — delivering an operational intelligence capability the client had no visibility into before.
Legacy Data Platform Migration — Retail
Fully migrated a retail client's data infrastructure from a legacy OLAP data cube model to a modern data lake architecture on Microsoft Azure. The new platform replaced rigid, hard-to-maintain cube structures with a flexible, scalable foundation — enabling faster analytics iteration and unlocking new data use cases for the business.
Petabyte-Scale Streaming & Batch Infrastructure — Media
Led end-to-end data infrastructure handling petabytes of data across real-time streaming and batch pipelines for a high-growth media streaming platform. Designed and administered an AWS Redshift data warehouse with significant query performance improvements, and managed the full Qubole platform — including Hive, Spark, and Airflow clusters — while maintaining data governance and quality standards at scale.
Podcast Transcript & Emotion Recognition Pipeline
Built an end-to-end subtitle generation pipeline integrating HuggingFace speech-to-text and emotion recognition models to automatically transcribe podcast audio across multiple European languages. Generated emotionally-aware subtitles by mapping recognised emotional tone to text-to-speech output for localised, expressive delivery.
DQA: Data Quality Assessment Service
Designed and built an internal data quality service adopted by analysts and integrated across data pipelines. Provided automated summary statistics and configurable validation rules via a publish-subscribe architecture, exposing REST APIs and a Web UI for self-serve data quality monitoring across the organisation.
Experience
Leading data lakehouse migration of 100+ production pipelines from Oracle and Exasol to AWS and Databricks. Introduced right-sized cluster management achieving up to 35% faster pipeline execution and measurable cloud cost reduction. Architecting robust ETL infrastructure with comprehensive monitoring, fault handling, and data quality governance.
Architected cloud and on-premise data solutions for 10+ enterprise clients across retail, finance, healthcare, and telco on AWS, GCP, and Azure — delivering industry-specific outcomes including legacy platform migrations, real-time streaming pipelines, and AI infrastructure adopted globally. Collaborated with data science teams to build self-serve tooling covering infrastructure provisioning and data wrangling, containerised with Docker and Kubernetes.
Led end-to-end data infrastructure handling petabytes of data across real-time streaming and batch pipelines. Designed and administered AWS Redshift data warehouse with significant improvements in query performance. Managed Qubole platform including Hive, Spark, and Airflow clusters at scale.
Led full development of SORA, an in-house CRM system built to serve non-technical teams. Engineered near real-time data synchronisation from DynamoDB to Redshift using DynamoDB Streams, and built a dynamic query builder enabling business teams to access customer data without engineering support.
Contributed to revenue assurance initiatives identifying and recovering millions in lost revenue. Automated reconciliation processes and built monitoring systems to detect revenue leakage. Improved billing accuracy through Oracle PL/SQL development and BSCS iX administration.