Le dernier scandale est arrivé car le premier ministre @francoislegault a bloqué le journaliste @aardon_derfel sur twitter, affirmant que ce dernier le taggait beaucoup trop souvent:
Quand un "journaliste" me "tague" plus de 10 fois sur Twitter en disant que je mens...
— François Legault (@francoislegault) June 6, 2020 @Paul_Laurier a eu la superbe idée de regarder les données pour voir de quoi il en était.
Petite analyse du compte Twitter de @Aaron_Derfel , journaliste en matière de santé.

I’ve stumbled on something.. interesting.
To get the prediction for a Tweedie GLM, we take the link value then do exp(link), but to get prediction from an xgboost tweedie , we take the “link” value then do exp(link)/ 2 , dividing the result by 2.
Is this normal? Below is a quick demo showing how I get the predictions for a 3-trees xgboost and a glm.

blo
Quick post inspired by the winning / nearly there / need action graphs by @yaneerbaryam at https://www.endcoronavirus.org/countries.
Data Health regions date is compiled by Isha Berry & friends github. Montreal boroughs data is published daily. They only keep the total and keep no history, so @bouchecl visits them every day and compiles the data in this google sheet
Code As usual, the code for this post is on github.

I made a twitter survey a couple of weeks before the apocalypse to help me pick my next blog post topic and alll 3 members of the crowd overwhelmingly agreed that I should use bike gps data and graphhopper to find out how far cyclists are willing to go to use safer infrastructure.
This is awesome, because I had been looking for a use for this open data that contains GPS data for ~ 5000 bike trips in Montreal for a while.

Canada Federal Election 2019
{"x":{"url":"/post/2020-04-06-how-did-your-neighbours-vote-canada-2019-edition_files/figure-html//widgets/widget_neighbours_map.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}

I just got my feet wet with tweedie regression and the recipes package yesterday. The results have been underwhelming, as the models didnt appear that predictive. I figured I might give it another try, this time using Kaggle’s claim prediction challenge from 2012.
It is no longer possible to submit models, so we will create our own 20% test sample from the kaggle training data set and see how we fare.

I’m building my first tweedie model, and I’m finally trying the {recipes} package.
We will try to predict the pure premium of car insurance policy. This can be done directly with a tweedie model, or by multiplying two separates models: a frequency (Poisson) and a severity (Gamma) model. We wil be using “lift charts” and “double lift charts” to evaluate the model performance .
Here’s is the plan:
Pre-process the train and test data using recipes.