class: center, middle, inverse, title-slide # Petite introduction au package purrr ### Guillaume Devailly ### inra ### 2019/09/17 --- class: center # Je veux en savoir plus sur Pikachu -- ![:img_scale 80%](img/PokéAPI.png) --- ```r library(purrr) # the package I'm going to present library(jsonlite) # importing JSON data as R list library(dplyr) # imported for tible, pipe, etc. ``` --- ```r pikachu <- jsonlite::fromJSON( "https://pokeapi.co/api/v2/pokemon/pikachu" ) str(pikachu, max.level = 1) ## List of 17 ## $ abilities :'data.frame': 2 obs. of 3 variables: ## $ base_experience : int 112 ## $ forms :'data.frame': 1 obs. of 2 variables: ## $ game_indices :'data.frame': 20 obs. of 2 variables: ## $ height : int 4 ## $ held_items :'data.frame': 2 obs. of 2 variables: ## $ id : int 25 ## $ is_default : logi TRUE ## $ location_area_encounters: chr "https://pokeapi.co/api/v2/pokemon/25/encounters" ## $ moves :'data.frame': 81 obs. of 2 variables: ## $ name : chr "pikachu" ## $ order : int 35 ## $ species :List of 2 ## $ sprites :List of 8 ## $ stats :'data.frame': 6 obs. of 3 variables: ## $ types :'data.frame': 1 obs. of 2 variables: ## $ weight : int 60 ``` --- # Quel est l'Indice de Masse Corporelle de Pikachu ? ### (imc, ou bmi en anglais) ```r with(pikachu, tibble( name = "Pikachu", weight = weight/10, # API in hectograms, we want kilograms height = height/10, # API in decimeters, we want meters BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), # Pokemons can have several types, collapsing sprite = sprites$front_default )) ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 Pikac~ 6 0.4 37.5 electr~ https://raw.githubusercontent.com/Pok~ ``` --- # Mais si je m'intéresse aussi à Bulbizarre, Salamèche et Carapuce ? --- Solution 1 : copier, coller, modifier ```r bulbasaur <- jsonlite::fromJSON( "https://pokeapi.co/api/v2/pokemon/bulbasaur" ) with(bulbasaur, tibble( name = "bulbasaur", weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 bulbas~ 6.9 0.7 14.1 poison, ~ https://raw.githubusercontent.com/~ charmander <- jsonlite::fromJSON( "https://pokeapi.co/api/v2/pokemon/charmander" ) with(charmander, tibble( name = "charmander", weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/Pok~ squirtle <- jsonlite::fromJSON( "https://pokeapi.co/api/v2/pokemon/squirtle" ) with(squirtle, tibble( name = "squirtle", weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 squirt~ 9 0.5 36 water https://raw.githubusercontent.com/Poke~ ``` --- Solution 1 : copier, coller, **modifier** ```r *bulbasaur <- jsonlite::fromJSON( * "https://pokeapi.co/api/v2/pokemon/bulbasaur" ) *with(bulbasaur, tibble( * name = "bulbasaur", weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 bulbas~ 6.9 0.7 14.1 poison, ~ https://raw.githubusercontent.com/~ *charmander <- jsonlite::fromJSON( * "https://pokeapi.co/api/v2/pokemon/charmander" ) *with(charmander, tibble( * name = "charmander", weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/Pok~ *squirtle <- jsonlite::fromJSON( * "https://pokeapi.co/api/v2/pokemon/squirtle" ) *with(squirtle, tibble( * name = "squirtle", weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 squirt~ 9 0.5 36 water https://raw.githubusercontent.com/Poke~ ``` --- * laborieux * source d'erreurs * répercuter tout changement dans chaque copier - coller Mais les IDE peuvent aider : * Rechercher et Remplacer * curseurs multiples (alt + click sour RStudio) --- Solution 2: une boucle `for` ```r my_pokemons <- c("pikachu", "bulbasaur", "charmander", "squirtle") pokemon_bmi <- tibble( # pré-allocation name = character(0), weight = double(0), height = double(0), bmi = double(0), type = character(0), sprite = character(0) ) for (i in seq_along(my_pokemons)) { poke_i <- jsonlite::fromJSON( paste0("https://pokeapi.co/api/v2/pokemon/", my_pokemons[i]) ) pokemon_bmi[i, ] <- with(poke_i, tibble( name = my_pokemons[i], weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) } ``` --- ```r pokemon_bmi ## # A tibble: 4 x 6 ## name weight height bmi type sprite ## * <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikachu 6 0.4 37.5 electric https://raw.githubusercontent.com/~ ## 2 bulbasa~ 6.9 0.7 14.1 poison,~ https://raw.githubusercontent.com/~ ## 3 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/~ ## 4 squirtle 9 0.5 36 water https://raw.githubusercontent.com/~ ``` * fonctionne * rapide Mais : * verbeux * crée des variables dans l'environnement global * peut être lent si l'objet grandit : `pokemon_bmi <- rbind(pokemon_bmi, new_pokemon)` * besoin de réfléchir pour ne pas tomber dans ce genre de piège * non immédiatement parallélisable --- La solution idiomatique : `lapply()` Étape 1 : créer une fonction ```r get_pokemon_bmi <- function(poke_name, sleep = 0) { my_pokemon <- jsonlite::fromJSON( paste0("https://pokeapi.co/api/v2/pokemon/", poke_name) ) Sys.sleep(sleep) return( with(my_pokemon, tibble( name = name, weight = weight/10, height = height/10, BMI = weight / (height^2), type = paste(types$type$name, collapse = ", "), sprite = sprites$front_default )) ) } get_pokemon_bmi("pikachu") ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikac~ 6 0.4 37.5 electr~ https://raw.githubusercontent.com/Pok~ ``` --- Étape 2 : appeler la fonction sur chaque élément d'un vecteur ```r my_pokemons <- c("pikachu", "bulbasaur", "charmander", "squirtle") pokemon_bmi <- lapply( my_pokemons, # for each element in my_pokemons get_pokemon_bmi # run function get_pokemon_bmi ) str(pokemon_bmi) # a list :-( ## List of 4 ## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 6 variables: ## ..$ name : chr "pikachu" ## ..$ weight: num 6 ## ..$ height: num 0.4 ## ..$ BMI : num 37.5 ## ..$ type : chr "electric" ## ..$ sprite: chr "https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/25.png" ## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 6 variables: ## ..$ name : chr "bulbasaur" ## ..$ weight: num 6.9 ## ..$ height: num 0.7 ## ..$ BMI : num 14.1 ## ..$ type : chr "poison, grass" ## ..$ sprite: chr "https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/1.png" ## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 6 variables: ## ..$ name : chr "charmander" ## ..$ weight: num 8.5 ## ..$ height: num 0.6 ## ..$ BMI : num 23.6 ## ..$ type : chr "fire" ## ..$ sprite: chr "https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/4.png" ## $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 6 variables: ## ..$ name : chr "squirtle" ## ..$ weight: num 9 ## ..$ height: num 0.5 ## ..$ BMI : num 36 ## ..$ type : chr "water" ## ..$ sprite: chr "https://raw.githubusercontent.com/PokeAPI/sprites/master/sprites/pokemon/7.png" ``` --- Étape 3 : transformer la liste en tableau ```r pokemon_bmi <- do.call( rbind, pokemon_bmi ) pokemon_bmi ## # A tibble: 4 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikachu 6 0.4 37.5 electric https://raw.githubusercontent.com/~ ## 2 bulbasa~ 6.9 0.7 14.1 poison,~ https://raw.githubusercontent.com/~ ## 3 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/~ ## 4 squirtle 9 0.5 36 water https://raw.githubusercontent.com/~ ``` --- * en théorie, moins besoin de mettre les mains dans le cambouis * en pratique, `x <- do.call(rbind, x)` ... * force a écrire des fonctions * facilement parallélisable : ```r pokemon_bmi <- parallel::mclapply( my_pokemons, get_pokemon_bmi, mc.cores = 8 ) ## Error in parallel::mclapply(my_pokemons, get_pokemon_bmi, mc.cores = 8): 'mc.cores' > 1 is not supported on Windows ``` --- Solution 4: le tidyverse Étape 1 : définir une fonction Étape 2 : itérer ```r my_pokemons <- c("pikachu", "bulbasaur", "charmander", "squirtle") pokemon_bmi <- map_dfr( # le résultat est un _D_ata _F_rame aggloméré par _R_angs my_pokemons, get_pokemon_bmi ) pokemon_bmi ## # A tibble: 4 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikachu 6 0.4 37.5 electric https://raw.githubusercontent.com/~ ## 2 bulbasa~ 6.9 0.7 14.1 poison,~ https://raw.githubusercontent.com/~ ## 3 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/~ ## 4 squirtle 9 0.5 36 water https://raw.githubusercontent.com/~ ``` --- * moins besoin de mettre les mains dans le cambouis * facilement parallélisable : ```r library(furrr) plan(multiprocess(workers = 8)) future_map_dfr(my_pokemons, get_pokemon_bmi) ## # A tibble: 4 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikachu 6 0.4 37.5 electric https://raw.githubusercontent.com/~ ## 2 bulbasa~ 6.9 0.7 14.1 poison,~ https://raw.githubusercontent.com/~ ## 3 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/~ ## 4 squirtle 9 0.5 36 water https://raw.githubusercontent.com/~ ``` --- Une fonction `map_*` par type de résultat souhaité : <table> <thead> <tr> <th style="text-align:left;"> function </th> <th style="text-align:left;"> result </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> map() </td> <td style="text-align:left;"> list </td> </tr> <tr> <td style="text-align:left;"> map_chr() </td> <td style="text-align:left;"> character vector </td> </tr> <tr> <td style="text-align:left;"> map_int() </td> <td style="text-align:left;"> integer vector </td> </tr> <tr> <td style="text-align:left;"> map_dbl() </td> <td style="text-align:left;"> double vector </td> </tr> <tr> <td style="text-align:left;"> map_lgl() </td> <td style="text-align:left;"> logical vector </td> </tr> <tr> <td style="text-align:left;"> map_dfr() </td> <td style="text-align:left;"> data frame, row bind </td> </tr> <tr> <td style="text-align:left;"> map_dfc() </td> <td style="text-align:left;"> data frame, col bind </td> </tr> <tr> <td style="text-align:left;"> walk() </td> <td style="text-align:left;"> no return (side effects) </td> </tr> </tbody> </table> --- Fonctions anonymes : ```r map_dbl(my_pokemons, function(x) { get_pokemon_bmi(x)[["height"]] }) ## [1] 0.4 0.7 0.6 0.5 map_dbl(my_pokemons, ~get_pokemon_bmi(.x)[["height"]]) ## [1] 0.4 0.7 0.6 0.5 ``` --- Préserve les noms : ```r my_pokemons ## [1] "pikachu" "bulbasaur" "charmander" "squirtle" set_names(my_pokemons) ## pikachu bulbasaur charmander squirtle ## "pikachu" "bulbasaur" "charmander" "squirtle" my_pokemons %>% map_chr(~get_pokemon_bmi(.x)[["type"]]) ## [1] "electric" "poison, grass" "fire" "water" set_names(my_pokemons) %>% map_chr(~get_pokemon_bmi(.x)[["type"]]) ## pikachu bulbasaur charmander squirtle ## "electric" "poison, grass" "fire" "water" ``` --- Attrapage d'erreur : ```r my_pokemons <- c("pikachu", "bulbasaur", "charmander", "squirtle", "magicarpe") map(my_pokemons, get_pokemon_bmi) ## Error in open.connection(con, "rb"): HTTP error 404. ``` **Tout** le calcul est perdu à cause d'une erreur sur **une seule** itération (T__T) --- ```r safe_get_pokemon_bmi <- safely(get_pokemon_bmi) safe_get_pokemon_bmi("pikachu") ## $result ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikac~ 6 0.4 37.5 electr~ https://raw.githubusercontent.com/Pok~ ## ## $error ## NULL safe_get_pokemon_bmi("magicarpe") ## $result ## NULL ## ## $error ## <simpleError in open.connection(con, "rb"): HTTP error 404.> ``` --- ```r prelim_results <- set_names(my_pokemons) %>% map(safe_get_pokemon_bmi) success <- map_lgl(prelim_results, ~is.null(.x$error)) success ## pikachu bulbasaur charmander squirtle magicarpe ## TRUE TRUE TRUE TRUE FALSE map_dfr(prelim_results[success], "result") ## # A tibble: 4 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikachu 6 0.4 37.5 electric https://raw.githubusercontent.com/~ ## 2 bulbasa~ 6.9 0.7 14.1 poison,~ https://raw.githubusercontent.com/~ ## 3 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/~ ## 4 squirtle 9 0.5 36 water https://raw.githubusercontent.com/~ ``` --- Ou alors : ```r possibly_get_pokemon_bmi <- possibly(get_pokemon_bmi, otherwise = NULL) possibly_get_pokemon_bmi("pikachu") ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikac~ 6 0.4 37.5 electr~ https://raw.githubusercontent.com/Pok~ possibly_get_pokemon_bmi("magicarpe") ## NULL my_pokemons ## [1] "pikachu" "bulbasaur" "charmander" "squirtle" "magicarpe" map_dfr( my_pokemons, ~possibly(get_pokemon_bmi, otherwise = NULL)(.x) ) ## # A tibble: 4 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikachu 6 0.4 37.5 electric https://raw.githubusercontent.com/~ ## 2 bulbasa~ 6.9 0.7 14.1 poison,~ https://raw.githubusercontent.com/~ ## 3 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/~ ## 4 squirtle 9 0.5 36 water https://raw.githubusercontent.com/~ ``` --- Attention : les erreurs sont devenues silencieuses ! ```r get_pokemon_bmi("magicarpe") ## Error in open.connection(con, "rb"): HTTP error 404. possibly(get_pokemon_bmi, otherwise = NULL)("magicarpe") ## NULL possibly(get_pokemon_bmi, otherwise = NULL, quiet = FALSE)("magicarpe") ## Error: HTTP error 404. ## NULL ``` --- Et sur 151 pokémon ? ```r pokemon_bmi <- map_dfr(1:151, possibly_get_pokemon_bmi, sleep = 1) library(ggplot2) library(ggimage) ggplot(pokemon_bmi, aes(x = weight, y = height, text = name)) + stat_function(fun = function(w) sqrt(w/30), color = "darkred", linetype = "dotted") + stat_function(fun = function(w) sqrt(w/15), color = "orange", linetype = "dotted") + annotate("text", x = rep(350, 2), y = c(3.1, 4.5), label = c("BMI = 30", "BMI = 15"), color = c("darkred", "orange")) + geom_image(aes(image = sprite), size = 0.12) + theme_bw(base_size = 14) ``` --- class: center ![](intro_purrr_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- # Bonus Vectorisez vos fonctions, plus besoin de boucles ! ```r get_pokemon_bmi(c("pikachu", "bulbasaur", "charmander", "squirtle")) ## Error: lexical error: invalid char in json text. ## https://pokeapi.co/api/v2/pokem ## (right here) ------^ get_pokemon_bmi_v <- function(poke_names, sleep = 0) { map_dfr(poke_names, get_pokemon_bmi, sleep = sleep) } get_pokemon_bmi_v(c("pikachu", "bulbasaur", "charmander", "squirtle")) ## # A tibble: 4 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikachu 6 0.4 37.5 electric https://raw.githubusercontent.com/~ ## 2 bulbasa~ 6.9 0.7 14.1 poison,~ https://raw.githubusercontent.com/~ ## 3 charman~ 8.5 0.6 23.6 fire https://raw.githubusercontent.com/~ ## 4 squirtle 9 0.5 36 water https://raw.githubusercontent.com/~ get_pokemon_bmi_v("pikachu") ## # A tibble: 1 x 6 ## name weight height BMI type sprite ## <chr> <dbl> <dbl> <dbl> <chr> <chr> ## 1 pikac~ 6 0.4 37.5 electr~ https://raw.githubusercontent.com/Pok~ ``` --- # Pour aller plus loin - passer deux séries d'arguments : `map2(x, y, fun)` - passer n séries d'arguments : `pmap(list(x, y, z, ...), fun)` - exécuter sur un sous ensemble : `map_at(x, at, fun)`, `map_if(x, cond, fun)` - composer des fonctions : `compose(fun1, fun2)` - et plus encore ! Liens : - [Itération de fonctions avec purrr](http://perso.ens-lyon.fr/lise.vaudor/iterer-des-fonctions-avec-purrr/) - [Lesser known purrr tricks](https://www.brodrigues.co/blog/2017-03-24-lesser_known_purrr/)