Projects


Projects with Code


This section showcases some of the projects that I have done as part of my training or just for fun. They include visualizations with Python, R, and Tableau, different machine learning approaches, and general data wrangling or scraping code. Click the image to get to the code on GitHub (opens in new window).

Full Marketing Project Deliverable (Customer Segmentation, Report) – 2021

Goals: analyze customer base of subscription-based online game, segment players into categories for marketing campaigns, deliver report with actionable recommendations and code (notebook, conda environment, instructions)

tags: Jupyter notebook, customer segmentation, marketing, report, writing for non-technical audiences

Language: Python

Company Name Generator (Web Scraping, App) – 2020

Goals: scrape and clean word list, create small app to create a composite company name from user input.

tags: Jupyter notebook, web scraping, beautifulsoup4, app, functional

Language: Python

Breast Cancer Classification (Supervised ML) – 2019

Goals: Analyze the “Breast Cancer Wisconsin (Diagnostic) Data Set” and create a machine learning model to predict diagnosis (benign or malignant tumor).

tags: supervised machine learning, Jupyter notebook, PCA, Random Forest, XGBoost, AdaBoost, binary classification, grid search, random search, confusion matrix

Language: Python

Reproduce Results of Scientific Article (Statistics) – 2019

Goal: Reproduce analysis of Table 1 of previously published study with the test used in the publication, redo with different statistical test and evaluate methods

tags: statistics, hypothesis testing, R markdown, reproducibility crisis, exploratory data analysis, Shapiro-Wilk test for normality, power analysis, regression

Language: R

Image Labeling Code (Helper Code) – 2019

Goal: write script to access folder and pull random sample from list of images, get user input about image classification, save response and remove image from sample pool

tags: helper code, user input, image classification, automation, image labeling

Language: Python

Analyzing Yelp Dataset (Basic SQL) – 2019

Goal: Perform simple SQL operations and analyses on Yelp dataset as part of the “SQL for Data Science” by UC Davis course on Coursera.

tags: select, group by, order by, join, average/sum, count

Language: SQL

Analyzing Tweets with Twitter API (APIs) – 2019

Goal: For a selected Twitter account, different aspects were analyzed and visualized (frequency of tweets, hashtag usage)

tags: api, exploratory data analysis, real time data, hashtags

Language: Python

Analyzing Video Game Sales (Exploratory Analysis) – 2019

Goal: describe the data set, investigate how sales, ratings, and price are related, draw conclusions based on the analysis

tags: data sets, exploratory data analysis, market insights, matplotlib, seaborn, pandas

Language: Python

Visualizing Sales Data with Tableau (Business Intelligence) – 2019

Goal: create interactive visualizations, dashboards, and stories with Tableau Public using the Superstore dataset

tags: data sets, market insights, business intelligence, Tableau, storytelling, dashboards, design, KPI, interactive

Language: Tableau

Web Scraping Jobs Website (Web Scraping) – 2019

Goal: apply web scraping to obtain information about currently available jobs from a Swiss website, which publishes job ads and aggregates ads from company websites

tags: web scraping, beautifulsoup4, seaborn, data cleaning

Language: Python


Projects without Code


More elaborate projects were typically part of my work, or were done using proprietary data sources, therefore I’m unable to share the code. However, I will try my best with high-level explanations of what I have done.

Real Estate Pricing Model (a bit of everything) – 2020-21

Description: Crawl real estate portals to extract information on current offerings. Clean, transform, and enrich data using public sources and proprietary sources via API access. Reconstitute missing data using mean/median and KNN Imputer. Create reports and statistics about the Swiss housing situation to supplement a tax and cost map of the country. Create a machine learning model to predict asking prices of a range of homes. Get it deployment-ready for AWS.

tags: data cleaning and transforming, AutoGluon, AutoML, machine learning, public data, statistics, KNN imputer, web scraping, beautifulsoup4, Selenium, pandas, data set generation

Language: Python

Business Insights Dashboard for Customer Data (Business Intelligence) – 2020

Description: Use the company’s database to gain insights into customer behavior and loyalty. I took part in consulting with business stakeholders and database administrator on scope and functionality of app solution. I designed an easy-to-use interactive dashboard in Power BI in line with corporate branding. We delivered a functional, useful dashboard including extensive user manual.

tags: data cleaning and transforming, Power BI, visualization, data consulting, technical writing, statistics, linear modelling, interactive design, KPIs

Language: R, Power BI, Excel

Visualizations and Insights from Statistical Data (Statistics and Reports) – 2020

Description: Use data of the Federal Statistical Office to generate insights and trend analyses of hospitality statistics over the recent years for a tourist region. I created appealing, uncluttered graphs and a concise report of the findings as a one page overview.

tags: data cleaning and transforming, Excel, general interest writing, statistics

Language: Excel

Insights from Medical Device Data (Exploratory Data Analysis) – 2020

Description: Use archive of medical device user data do get preliminary insights into user behavior and user population statistics. Visualize distributions and make recommendations for further data use.

tags: statistics, exploratory data analysis, action report, visualization

Language: R

Classification of Microscopy Images with Computer Vision (Deep Learning) – 2019

Description: Use deep learning and computer vision to classify fluorescence microscopy images for a research project involving the introduction of mutations into cells. I labeled a subset of the images by hand using the labeling tool code above. Then I compared different neural network architectures in their ability to correctly classify images and finally presented the results at a meetup (with the researcher who originally acquired the microscopy images) and later at an internal meeting of a large company.

tags: deep learning, computer vision, TensorFlow, Keras, CNNs, supervised machine learning, picture classification, microscopy, computational biology, Inception v3, AWS

Language: Python

Icons from the Noun Project.