Projects with Code
This section showcases some of the projects that I have done as part of my training or just for fun. They include visualizations with Python, R, and Tableau, different machine learning approaches, and general data wrangling or scraping code. Click the image to get to the code on GitHub (opens in new window).

Full Marketing Project Deliverable (Customer Segmentation, Report) – 2021
Goals: analyze customer base of subscription-based online game, segment players into categories for marketing campaigns, deliver report with actionable recommendations and code (notebook, conda environment, instructions)
tags: Jupyter notebook, customer segmentation, marketing, report, writing for non-technical audiences
Language: Python

Company Name Generator (Web Scraping, App) – 2020
Goals: scrape and clean word list, create small app to create a composite company name from user input.
tags: Jupyter notebook, web scraping, beautifulsoup4, app, functional
Language: Python

Breast Cancer Classification (Supervised ML) – 2019
Goals: Analyze the “Breast Cancer Wisconsin (Diagnostic) Data Set” and create a machine learning model to predict diagnosis (benign or malignant tumor).
tags: supervised machine learning, Jupyter notebook, PCA, Random Forest, XGBoost, AdaBoost, binary classification, grid search, random search, confusion matrix
Language: Python

Reproduce Results of Scientific Article (Statistics) – 2019
Goal: Reproduce analysis of Table 1 of previously published study with the test used in the publication, redo with different statistical test and evaluate methods
tags: statistics, hypothesis testing, R markdown, reproducibility crisis, exploratory data analysis, Shapiro-Wilk test for normality, power analysis, regression
Language: R

Image Labeling Code (Helper Code) – 2019
Goal: write script to access folder and pull random sample from list of images, get user input about image classification, save response and remove image from sample pool
tags: helper code, user input, image classification, automation, image labeling
Language: Python

Analyzing Yelp Dataset (Basic SQL) – 2019
Goal: Perform simple SQL operations and analyses on Yelp dataset as part of the “SQL for Data Science” by UC Davis course on Coursera.
tags: select, group by, order by, join, average/sum, count
Language: SQL

Analyzing Tweets with Twitter API (APIs) – 2019
Goal: For a selected Twitter account, different aspects were analyzed and visualized (frequency of tweets, hashtag usage)
tags: api, exploratory data analysis, real time data, hashtags
Language: Python

Analyzing Video Game Sales (Exploratory Analysis) – 2019
Goal: describe the data set, investigate how sales, ratings, and price are related, draw conclusions based on the analysis
tags: data sets, exploratory data analysis, market insights, matplotlib, seaborn, pandas
Language: Python

Visualizing Sales Data with Tableau (Business Intelligence) – 2019
Goal: create interactive visualizations, dashboards, and stories with Tableau Public using the Superstore dataset
tags: data sets, market insights, business intelligence, Tableau, storytelling, dashboards, design, KPI, interactive
Language: Tableau

Web Scraping Jobs Website (Web Scraping) – 2019
Goal: apply web scraping to obtain information about currently available jobs from a Swiss website, which publishes job ads and aggregates ads from company websites
tags: web scraping, beautifulsoup4, seaborn, data cleaning
Language: Python
Projects without Code
More elaborate projects were typically part of my work, or were done using proprietary data sources, therefore I’m unable to share the code. However, I will try my best with high-level explanations of what I have done.
Real Estate Pricing Model (a bit of everything) – 2020-21
Description: Crawl real estate portals to extract information on current offerings. Clean, transform, and enrich data using public sources and proprietary sources via API access. Reconstitute missing data using mean/median and KNN Imputer. Create reports and statistics about the Swiss housing situation to supplement a tax and cost map of the country. Create a machine learning model to predict asking prices of a range of homes. Get it deployment-ready for AWS.
tags: data cleaning and transforming, AutoGluon, AutoML, machine learning, public data, statistics, KNN imputer, web scraping, beautifulsoup4, Selenium, pandas, data set generation
Language: Python
Business Insights Dashboard for Customer Data (Business Intelligence) – 2020
Description: Use the company’s database to gain insights into customer behavior and loyalty. I took part in consulting with business stakeholders and database administrator on scope and functionality of app solution. I designed an easy-to-use interactive dashboard in Power BI in line with corporate branding. We delivered a functional, useful dashboard including extensive user manual.
tags: data cleaning and transforming, Power BI, visualization, data consulting, technical writing, statistics, linear modelling, interactive design, KPIs
Language: R, Power BI, Excel
Visualizations and Insights from Statistical Data (Statistics and Reports) – 2020
Description: Use data of the Federal Statistical Office to generate insights and trend analyses of hospitality statistics over the recent years for a tourist region. I created appealing, uncluttered graphs and a concise report of the findings as a one page overview.
tags: data cleaning and transforming, Excel, general interest writing, statistics
Language: Excel
Insights from Medical Device Data (Exploratory Data Analysis) – 2020
Description: Use archive of medical device user data do get preliminary insights into user behavior and user population statistics. Visualize distributions and make recommendations for further data use.
tags: statistics, exploratory data analysis, action report, visualization
Language: R
Classification of Microscopy Images with Computer Vision (Deep Learning) – 2019
Description: Use deep learning and computer vision to classify fluorescence microscopy images for a research project involving the introduction of mutations into cells. I labeled a subset of the images by hand using the labeling tool code above. Then I compared different neural network architectures in their ability to correctly classify images and finally presented the results at a meetup (with the researcher who originally acquired the microscopy images) and later at an internal meeting of a large company.
tags: deep learning, computer vision, TensorFlow, Keras, CNNs, supervised machine learning, picture classification, microscopy, computational biology, Inception v3, AWS
Language: Python
Icons from the Noun Project.