This is a quick blog post to try and figure out which school boards are the “hottest zones” for covid. In this blog post, I will do the following: - get the shapefiles for both the school boards (commission scolaires, or CS) and the smallest possible health districts (réseau local de santé, or RLS) - find the intersection between school boards and RLS using sf::st_intersection() - find the population of each intersection using cancensus::get_census() and tongfen::tongfen_estimate() - assign the number of cases in each health district (RLS) to each intersection proportionnaly to the population using tongfen::proportional_reaggregate() - sum the number of cases for each school board (commissions scolaires).

Continue reading

Le dernier scandale est arrivé car le premier ministre @francoislegault a bloqué le journaliste @aardon_derfel sur twitter, affirmant que ce dernier le taggait beaucoup trop souvent: Quand un "journaliste" me "tague" plus de 10 fois sur Twitter en disant que je mens... — François Legault (@francoislegault) June 6, 2020 @Paul_Laurier a eu la superbe idée de regarder les données pour voir de quoi il en était. Petite analyse du compte Twitter de @Aaron_Derfel , journaliste en matière de santé.

Continue reading

I’ve stumbled on something.. interesting. To get the prediction for a Tweedie GLM, we take the link value then do exp(link), but to get prediction from an xgboost tweedie , we take the “link” value then do exp(link)/ 2 , dividing the result by 2. Is this normal? Below is a quick demo showing how I get the predictions for a 3-trees xgboost and a glm.

Continue reading

blo Quick post inspired by the winning / nearly there / need action graphs by @yaneerbaryam at https://www.endcoronavirus.org/countries. Data Health regions date is compiled by Isha Berry & friends github. Montreal boroughs data is published daily. They only keep the total and keep no history, so @bouchecl visits them every day and compiles the data in this google sheet Code As usual, the code for this post is on github.

Continue reading

I made a twitter survey a couple of weeks before the apocalypse to help me pick my next blog post topic and alll 3 members of the crowd overwhelmingly agreed that I should use bike gps data and graphhopper to find out how far cyclists are willing to go to use safer infrastructure. This is awesome, because I had been looking for a use for this open data that contains GPS data for ~ 5000 bike trips in Montreal for a while.

Continue reading

Canada Federal Election 2019 {"x":{"url":"/post/2020-04-06-how-did-your-neighbours-vote-canada-2019-edition_files/figure-html//widgets/widget_neighbours_map.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}

Continue reading

I just got my feet wet with tweedie regression and the recipes package yesterday. The results have been underwhelming, as the models didnt appear that predictive. I figured I might give it another try, this time using Kaggle’s claim prediction challenge from 2012. It is no longer possible to submit models, so we will create our own 20% test sample from the kaggle training data set and see how we fare.

Continue reading

Author's picture

Simon Coulombe

gosseux de données | pelleteux de cloud

data scientist in the insurance industry

Québec, Canada