Computational Epidemiologist • AI Researcher • ML Specialist • Data Scientist

Dr Reju Sam John

Transforming complex data into evidence that drives decisions through |

0+ Citations
0 Publications
0+ Years Experience
0 GitHub Repos
Scroll to explore

About Me

I'm a data scientist and AI/ML practitioner based in Auckland, New Zealand. With a PhD in Computational Modelling and 7+ years of experience, I build end-to-end data products — from ETL pipelines and predictive models to interactive dashboards and APIs — that turn complex data into business value.

My track record includes scalable ML pipelines achieving 90% forecasting accuracy, multi-module BI platforms, and CI/CD-enabled analytics products. I've published 11 peer-reviewed papers (169+ citations) in journals like Nature Communications, demonstrating rigorous, reproducible work at the highest standard.

I thrive in cross-functional teams — translating between data science, engineering, and non-technical stakeholders to define problems, shape strategy, and deliver measurable outcomes. My domain depth in health and epidemiology is complemented by hands-on product-building experience across business analytics, clinical informatics, and AI evaluation.

I'm always interested in challenging problems at the intersection of data science, AI, and real-world impact — whether that's building intelligent products, advancing research, or driving strategy through analytics.

AI & Machine Learning

End-to-end ML pipelines from model development through deployment. Deep learning, NLP, ensemble methods, and predictive/prescriptive analytics.

Data Products & Pipelines

Building dashboards, APIs, and analytics platforms. Scalable ETL, data quality monitoring, CI/CD, and reproducible workflows.

Predictive Analytics

90% accuracy in trend forecasting using ensemble ML. Statistical modelling, time series analysis, and simulation at scale across 340+ regions.

Stakeholder Communication

Translating complex technical outputs into reports, dashboards, and presentations for business, clinical, and policy audiences.

Work Experience

AI, Machine Learning & Data Science Consultant

Molecular Epidemiology and Public Health Laboratory

Current Mar 2025 – Present
  • Design and deploy end-to-end AI/ML pipelines (Python, R) for predictive modelling, risk evaluation, and evidence-based decision support.
  • Translate analytical outputs into actionable reports and dashboards for technical and non-technical stakeholders; mentor researchers in reproducible workflows.
  • Define data requirements, quality standards, and governance frameworks; embed data sovereignty and equity principles in product design.
PythonRML PipelinesData GovernanceStakeholder Communication

Production Operator — Regulated Healthcare Manufacturing

Fisher & Paykel Healthcare

Sep 2024 – Feb 2026
  • Operated within an ISO-regulated medical device manufacturing environment, gaining direct operational knowledge of quality systems, risk documentation, and compliance requirements applicable to health technology evaluation.
ISO ComplianceQuality SystemsMedical Devices

Postdoctoral Fellow — Data Science, Network Theory & Health Informatics

The University of Auckland

Jan 2023 – Mar 2024
  • Built and managed a longitudinal database integrating clinical records, lab results, and behavioural data with full audit-ready documentation and data quality controls.
  • Designed large-scale computational models coupling multiple data sources — from network analysis to simulation — delivering actionable insights for intervention planning.
  • Developed reproducible, version-controlled analytical workflows (Python, Linux) and communicated findings to cross-functional technical and non-technical audiences.
Database DesignData QualityNetwork AnalysisPythonLinux

Postdoctoral Fellow — Computational Epidemiologist & Health Data Scientist

Massey University

Nov 2020 – Jan 2023
  • Led end-to-end development of predictive models across 340 regions — from data acquisition and feature engineering to model validation and deployment — published as first-author in Journal of the Royal Society Interface.
  • Built scalable ETL pipelines integrating multi-source datasets (APIs, web scraping, administrative records), achieving 90% forecasting accuracy with ensemble ML models.
  • Maintained structured databases with quality control, data validation, and downstream ML-readiness. Collaborated with international cross-functional teams (NZ, USA, Europe).
ETL PipelinesEnsemble MLForecastingAPIsData Validation

Postdoctoral Fellow — Astrophysics Simulations & Computational Data Science

Inter-University Centre for Astronomy and Astrophysics (IUCAA), India

Aug 2018 – Nov 2020
  • Designed automated data processing pipelines for large-scale (1 TB+) datasets on HPC clusters; applied statistical inference and ML techniques to high-dimensional data for pattern detection and anomaly identification.
HPCData PipelinesStatistical InferenceBig DataAutomation

Technical Skills

Programming & Tools

Python R SQL Bash/Shell C Git/GitHub Linux LaTeX

AI & Machine Learning

Scikit-learn TensorFlow PyTorch Deep Learning NLP Ensemble Methods Time Series LLM Integration

Data Engineering

ETL Pipelines APIs / REST Data Warehousing Web Scraping Data Validation Pandas NumPy Airflow

Visualization & Applications

Power BI Tableau Streamlit Plotly Dash Flask Matplotlib Seaborn

Cloud & DevOps

AWS Google Cloud GitHub Actions CI/CD HPC Clusters Docker

Domain Expertise

Epidemiology Health Informatics Clinical Trials ICH GCP R3 Health Equity Te Tiriti o Waitangi

Selected Publications

11 peer-reviewed articles • 169+ citations • Google ScholarORCID

2024

High connectivity and human movement limits the impact of travel time on infectious disease transmission

John, R.S. et al.

Journal of the Royal Society Interface, 21(210)

First Author
2024

Modelling Lassa virus dynamics in West African Mastomys natalensis

John, R.S., Fatoyinbo, H.O., & Hayman, D.T.S.

Journal of the Royal Society Interface, 21(216)

First Author
2023

Identifying SARS-like coronavirus spillover risk hotspots

Muylaert, R.L. et al. (incl. John, R.S.)

Nature Communications

Nature Comms
2022

Transmission models indicate Ebola virus persistence in non-human primate populations is unlikely

Hayman, D.T.S., John, R.S., & Rohani, P.

Journal of the Royal Society Interface, 19(187)

Featured Projects

Data products, ML pipelines, and open-source research code

Biometrics Analytics Platform

Production-grade data product: 34,000+ measurements processed through a four-gate ETL pipeline with audit logging, Power BI star-schema export, interactive Streamlit dashboard, and CI/CD via GitHub Actions with pytest validation.

PythonETLPower BICI/CDpytest

Predictive Metapopulation Model

Large-scale predictive model across 340 regions with multi-source data integration and parameter optimisation. Open-source, reproducible pipeline accompanying first-author publication.

PythonPredictive ModellingPublished

Zoonotic Dynamics Simulation

Simulation-based predictive modelling with parameter estimation and sensitivity analysis. Open-source research code accompanying peer-reviewed publication.

PythonSimulationPublished

Sales Intelligence Platform

Market basket analysis and temporal pattern detection with external data integration (weather API). Business-oriented data product demonstrating customer behaviour insights and revenue optimisation.

PythonMarket BasketBusiness Analytics

Education & Certifications

Education

Ph.D. in Physics
Computational Modelling, Data Analysis & Simulation
Pondicherry University, India
2011 – 2018
M.Sc. in Physics
Mahatma Gandhi University, India
2006 – 2008

Certifications

Understanding Te Tiriti o Waitangi
Groundwork • Sep 2025
ICH Good Clinical Practice R3
Global Health Training Centre • Jul 2025
AWS Machine Learning Essential Training
Mar 2025
Google Cloud: Building Data Pipelines
Apr 2025
Deep Learning • NLP with Python • Ensemble Learning
Apr 2025
Power BI • SQL, Tableau, Python & Spark
2024

Get in Touch

Let's build something meaningful together

Work With Me

I build end-to-end data products — from ETL pipelines and predictive models to dashboards and APIs. 7+ years experience, PhD-qualified, with a track record of delivering measurable outcomes across cross-functional teams.

Send Me an Email

Research Collaboration?

My expertise spans computational modelling, AI evaluation, predictive analytics, and health informatics. I'm eager to collaborate on impactful, data-driven research across domains.

Propose a Collaboration