About
I'm a Senior Data Engineer with over 15 years of experience building and delivering large-scale data solutions across multiple industries and continents. Currently based in Berlin, Germany, I specialize in building robust ETL pipelines, data lakehouses, and cloud-native infrastructure that powers data-driven decision making at scale.
I've led and built 100+ production pipelines to modern data lakehouse architecture, implementing comprehensive monitoring, fault handling, and data quality governance. My career spans work with enterprise clients across retail, finance, healthcare, and telecommunications, delivering solutions on AWS, GCP, and MS Azure that handle petabytes of data efficiently.
Currently, I am expanding into AI/ML through Georgia Tech's Machine Learning program and hands-on projects involving LLMs, deep learning, and agentic engineering. I'm passionate about bridging the gap between data engineering and machine learning operations, building the infrastructure that enables AI systems to thrive in production environments.
Skills & Technologies
Big Data & Databases
- Apache Spark
- Databricks
- AWS Redshift
- PostgreSQL
- Oracle PL/SQL
- DynamoDB
- Google BigQuery
- Elastic Stack
Cloud & Orchestration
- AWS (S3, Lambda, EC2)
- Google Cloud Platform
- Microsoft Azure
- Apache Airflow
- MLflow
- Docker & Kubernetes
AI/ML & Programming
- AWS SageMaker & Bedrock
- LLMs & Agentic Engineering
- Deep Learning (RNN, CNN)
- PyTorch & Scikit-Learn
- Python
- Go
- FastAPI
DevOps & Monitoring
- GitLab CI/CD
- DataDog
- Grafana
- Bugsnag
- Sentry
Featured Projects
Data Lakehouse Migration
Led the migration of 100+ production ETL pipelines from legacy Oracle and Exasol systems to a modern data lakehouse architecture on AWS and Databricks. Implemented comprehensive monitoring, fault handling, and data quality governance, delivering significant cost savings through infrastructure modernisation and optimised cloud resource utilisation.
Podcast Transcript & Emotion Recognition Pipeline
Built end-to-end subtitle generation pipeline integrating HuggingFace speech-to-text and emotion recognition models to automatically transcribe podcast audio across multiple European languages. Generated emotionally-aware subtitles by mapping recognised emotional tone to text-to-speech output for localised, expressive delivery.
DQA: Data Quality Assessment Service
Designed and built internal data quality service adopted by analysts and integrated across data pipelines at Sertis. Provided automated summary statistics and configurable validation rules via publish-subscribe architecture, exposing REST APIs and Web UI for self-serve data quality monitoring across the organisation.
Petabyte-Scale Infrastructure
Sole data engineer responsible for end-to-end data infrastructure handling petabytes of data across real-time streaming and batch pipelines. Designed and administered AWS Redshift data warehouse, managed Qubole platform including Hive, Spark, and Airflow clusters while maintaining data governance and quality standards at scale.
Demand Forecasting ML Pipeline
Delivered two-phase engagement for multinational consumer goods company. Phase one: designed ETL pipeline to Azure SQL DB enabling downstream analytics. Phase two: architected cloud infrastructure on Azure and Databricks to host and serve demand forecasting ML model for production deployment.
Experience
Leading data lakehouse migration of 100+ production pipelines from Oracle and Exasol to AWS and Databricks. Architecting robust ETL infrastructure with comprehensive monitoring, fault handling, and data quality governance. Driving cost optimisation through infrastructure modernisation.
Architected cloud and on-premise data solutions for 10+ enterprise clients across retail, finance, healthcare, and telco on AWS, GCP, and Azure. Developed internal self-serve tooling for data scientists covering infrastructure provisioning and data wrangling, containerised with Docker and Kubernetes.
Sole data engineer responsible for end-to-end infrastructure handling petabytes of data. Designed and administered AWS Redshift data warehouse with significant improvements in query performance. Managed Qubole platform including Hive, Spark, and Airflow clusters at scale.
Sole data and software engineer driving full development of SORA, an in-house CRM system. Engineered near real-time data synchronisation from DynamoDB to Redshift using DynamoDB Streams. Built dynamic query builder for non-technical teams to access customer data.
Contributed to revenue assurance initiatives identifying and recovering millions in lost revenue. Automated reconciliation processes and built monitoring systems to detect revenue leakage. Improved billing accuracy through Oracle PL/SQL development and BSCS iX administration.
Get In Touch
I'm open to new opportunities, collaborations, and conversations about data engineering and AI/ML. Feel free to reach out through any of the following channels: