library(tidyverse) library(sf) library(janitor) library(osmdata) library(purrr) liste des bornes sf degres_longitude_pour_1km <- function(latitude_degres){ earth_diameter <- 40075 line_of_longitude_length <- cos(latitude_degres * 2 * pi / 360) * earth_diameter degres_longitude_pour_1km = 1* 360 / line_of_longitude_length return(degres_longitude_pour_1km) } degres_latitude_pour_1km = 360 / 40075 bornes <- read_csv("https://lecircuitelectrique-data.s3.ca-central-1.amazonaws.com/stations/export_sites_fr.csv") %>% janitor::clean_names() brcc <- bornes %>% filter(niveau_de_recharge == "BRCC") %>% st_as_sf(coords= c("longitude","latitude"), crs = 4326, agr = "constant", remove = FALSE) %>% mutate(borne_bbox = purrr::map2(longitude, latitude, ~ matrix( data = c(.

Continue reading

I learned about the Climatological Database for the World’s Oceans (CLIWOC), a super cool database containing 287 116 days of ship log entries that have been digitized to better understand climate change. Each record holds the date, the ship’s name, company and nationality, the latitude and longitude and a wealth of meteorological information. Objective I want to make an animated map with a bunch of moving ships because I can.

Continue reading

This is a quick blog post to try and figure out which school boards are the “hottest zones” for covid. In this blog post, I will do the following: - get the shapefiles for both the school boards (commission scolaires, or CS) and the smallest possible health districts (réseau local de santé, or RLS) - find the intersection between school boards and RLS using sf::st_intersection() - find the population of each intersection using cancensus::get_census() and tongfen::tongfen_estimate() - assign the number of cases in each health district (RLS) to each intersection proportionnaly to the population using tongfen::proportional_reaggregate() - sum the number of cases for each school board (commissions scolaires).

Continue reading

Le dernier scandale est arrivé car le premier ministre @francoislegault a bloqué le journaliste @aardon_derfel sur twitter, affirmant que ce dernier le taggait beaucoup trop souvent: Quand un "journaliste" me "tague" plus de 10 fois sur Twitter en disant que je mens... — François Legault (@francoislegault) June 6, 2020 @Paul_Laurier a eu la superbe idée de regarder les données pour voir de quoi il en était. Petite analyse du compte Twitter de @Aaron_Derfel , journaliste en matière de santé.

Continue reading

I’ve stumbled on something.. interesting. To get the prediction for a Tweedie GLM, we take the link value then do exp(link), but to get prediction from an xgboost tweedie , we take the “link” value then do exp(link)/ 2 , dividing the result by 2. Is this normal? Below is a quick demo showing how I get the predictions for a 3-trees xgboost and a glm.

Continue reading

blo Quick post inspired by the winning / nearly there / need action graphs by @yaneerbaryam at https://www.endcoronavirus.org/countries. Data Health regions date is compiled by Isha Berry & friends github. Montreal boroughs data is published daily. They only keep the total and keep no history, so @bouchecl visits them every day and compiles the data in this google sheet Code I went a bit over the top for this one and created an R package you can install to recreate all the graphs and fetch the data.

Continue reading

I made a twitter survey a couple of months before the apocalypse to help me pick my next blog post topic and all 3 members of the crowd overwhelmingly agreed that I should use bike gps data and graphhopper to find out how far cyclists are willing to go to use safer infrastructure. This is awesome, because I had been looking for a use for this open data that contains GPS data for ~ 5000 bike trips in Montreal for a while.

Continue reading

Author's picture

Simon Coulombe

gosseux de données | pelleteux de cloud

data scientist in the insurance industry

Québec, Canada