Projects • Juliano Garcia

Projects

A Study on Gradient Boosting Classifiers

My undergraduate thesis with an experimental analysis of LightGBM hyperparameters in classification models

Gradient Boosting Machines (GBMs) is a supervised machine learning algorithm that has been achieving state-of-the-art results in a wide range of different problems and winning machine learning competitions. When building any machine learning model, the hyperparameter optimization can become a costly and time-consuming task depending on the number and the hyperparameter space of the tuning procedure. Machine learning users that are not experienced researchers or data science professionals can struggle to define which hyperparameters and values to choose when starting the model tuning, especially with newer GBMs implementations like the XGBoost and LightGBM library. In this work, a large-scale experiment with 70 datasets is conducted using the OpenML platform, measuring the sensitivity of binary classifiers evaluation metrics to changes in three LightGBM hyperparameters.

A solid statistical framework is applied to the study results, analyzing the behavior from three different viewpoints: results by hyperparameters, results by characteristics of the dataset and results by performance metric. The carried out experiments indicate insightful relationships of the hyperparameters in gradient boosting classifiers, uncovering which combinations of hyperparameters resulted in models with the highest change in the metrics from the baseline, what metrics are most sensitive and which characteristics of the studied datasets stood out. These results are hereby here presented to facilitate the model building of gradient boosting classifiers for machine learning users.

Women's Shoes Prices - Kaggle EDA Exploratory data analysis of the Dataset 'Women's Shoes Prices' by Datafiniti Company

The purpose of this project is to study the Women’s Shoes Prices dataset from Kaggle, which contains a list of women’s shoes and the prices they were sold. The data was originally made available at Kaggle by the Datafiniti Company. This project was done mainly to showcase some basic EDA techniques (like histograms and ECDF) but at the same time diving deep into more complicated business questions, e.g. using Fuzzy matching to find out the most popular shoe colors in the dataset.

Snacker - AI Recommender System A snack recommendation service (using collaborative filtering), that recommends snacks based on your personal taste and geolocation

Snacker is a full-fledged social network for snack lovers all over the world. The website itself was built using the Flask library for the back-end, and the recommendation engine was implemented using a specific collaborative filtering technique called Matrix Factorization. The data was stored using MongoDB for prototyping and testing, more details about the methodology can be found here.

Incognito Search - Chrome Extension Incognito search extension for Chrome. Just right click and search directly on incognito mode.

Simple Chrome Extension that searches in an Incognito window for the selected text. Built using Javascript.

Movies Ontology - iMDb Ontology, parser and system modeling in description logic (DL) for movies

A knowledge-based systems is an area of study in artificial intelligence which tries to capture human knowledge (which usually are experts in the system) and embedded them into rules in a system. In this specific project iMDb data was used to create an ontology of movies, by parsing iMDb datasets and adding them into the ontology. The modelling of the ontology was made with Protégé, an open-source ontology editor, using a Description Logic called the Web Ontology Language (OWL).

2nd website version My second website, implemented using Jekyll

This was my second website, where I used Jekyll to write a few blog posts.

Law of Large Numbers Simulator Bernoulli's theorem Simulator (Weak Law of Large Numbers)

This is a project to demonstrate in graph plotting and real data, Weak Law of Large numbers, which is a very useful Theorem used in statistics. As our website says: " In practice, it (the Weak Law of Large Numbers) dictates that, if repeated enough times, the accumulate results of the same experiment will tend to its real mathematical probabilities.”

The website plots in real time the graph of throwing a dice or a coin the number of times the user specifies, so it’s completely interactive, and the user have fully control over the start number, the amount of throws to perform, and even the speed of the plot. Also, the library that we use (Canvas JS) allow the user to see in real time the convergence of the probability. It was developed in partnership with Pedro Pereira in 2016.

1st website version My first personal website (in PHP)

This was my first website. You can check it out for more projects related to my programming work at high school.