Prediction

Bootstrap Statistics with Tidymodels to Compare Bicycle Helmets

Bootstrapping has long been one of my favorite statistical procedures. The nonparametric version requires few assumptions, and it shares attributes with both simulation and common Bayesian models, both of which I love. At the end of the day I wonder, why settle for a point estimate and two values for the confidence intervals when you can create a distribution and visualize your CIs? When my friend Rich first introduced me to bootstrapping, he said that if statistics had been invented in the computer age, this is where most classes would start.

Lessons From My First Kaggle Contest

Kaggle is a forum for interacting with other data scientists and competing to see who can write code that will best predict features of data. It’s a way to test your skills at statistics and machine learning, and to do a lot of human learning in the process (sorry, bad pun). When I entered the contest to categorize crimes that occurred in San Francisco, my initial goal was to do better than random chance.