I learned about the Climatological Database for the World’s Oceans (CLIWOC), a super cool database containing 287 116 days of ship log entries that have been digitized to better understand climate change.
Each record holds the date, the ship’s name, company and nationality, the latitude and longitude and a wealth of meteorological information.
Objective I want to make an animated map with a bunch of moving ships because I can.

This is a quick blog post to try and figure out which school boards are the “hottest zones” for covid.
In this blog post, I will do the following: - get the shapefiles for both the school boards (commission scolaires, or CS) and the smallest possible health districts (réseau local de santé, or RLS) - find the intersection between school boards and RLS using sf::st_intersection() - find the population of each intersection using cancensus::get_census() and tongfen::tongfen_estimate() - assign the number of cases in each health district (RLS) to each intersection proportionnaly to the population using tongfen::proportional_reaggregate() - sum the number of cases for each school board (commissions scolaires).

Le dernier scandale est arrivé car le premier ministre @francoislegault a bloqué le journaliste @aardon_derfel sur twitter, affirmant que ce dernier le taggait beaucoup trop souvent:
Quand un "journaliste" me "tague" plus de 10 fois sur Twitter en disant que je mens...
— François Legault (@francoislegault) June 6, 2020 @Paul_Laurier a eu la superbe idée de regarder les données pour voir de quoi il en était.
Petite analyse du compte Twitter de @Aaron_Derfel , journaliste en matière de santé.

I’ve stumbled on something.. interesting.
To get the prediction for a Tweedie GLM, we take the link value then do exp(link), but to get prediction from an xgboost tweedie , we take the “link” value then do exp(link)/ 2 , dividing the result by 2.
Is this normal? Below is a quick demo showing how I get the predictions for a 3-trees xgboost and a glm.

blo
Quick post inspired by the winning / nearly there / need action graphs by @yaneerbaryam at https://www.endcoronavirus.org/countries.
Data Health regions date is compiled by Isha Berry & friends github. Montreal boroughs data is published daily. They only keep the total and keep no history, so @bouchecl visits them every day and compiles the data in this google sheet
Code I went a bit over the top for this one and created an R package you can install to recreate all the graphs and fetch the data.

I made a twitter survey a couple of months before the apocalypse to help me pick my next blog post topic and all 3 members of the crowd overwhelmingly agreed that I should use bike gps data and graphhopper to find out how far cyclists are willing to go to use safer infrastructure.
This is awesome, because I had been looking for a use for this open data that contains GPS data for ~ 5000 bike trips in Montreal for a while.

Canada Federal Election 2019
{"x":{"url":"/post/2020-04-06-how-did-your-neighbours-vote-canada-2019-edition_files/figure-html//widgets/widget_neighbours_map.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}

I just got my feet wet with tweedie regression and the recipes package yesterday. The results have been underwhelming, as the models didnt appear that predictive. I figured I might give it another try, this time using Kaggle’s claim prediction challenge from 2012.
It is no longer possible to submit models, so we will create our own 20% test sample from the kaggle training data set and see how we fare.

I’m building my first tweedie model, and I’m finally trying the {recipes} package.
We will try to predict the pure premium of car insurance policy. This can be done directly with a tweedie model, or by multiplying two separates models: a frequency (Poisson) and a severity (Gamma) model. We wil be using “lift charts” and “double lift charts” to evaluate the model performance .
Here’s is the plan:
Pre-process the train and test data using recipes.