Issue
I am trying to find out the number of moves each Pokemon (first generation) could learn.
I found the following website that contains this information: https://pokemondb.net/pokedex/game/red-blue-yellow
There are 151 Pokemon listed here - and for each of them, their move set is listed on a template page like this: https://pokemondb.net/pokedex/bulbasaur/moves/1
Since I am using R, I tried to get the website addresses for each of these 150 Pokemon (https://docs.google.com/document/d/1fH_n_BPbIk1bZCrK1hLAJrYPH2d5RTy9IgdR5Ck_lNw/edit#):
names = c("Bulbasaur","Ivysaur","Venusaur","Charmander","Charmeleon","Charizard","Squirtle","Wartortle","Blastoise","Caterpie","Metapod","Butterfree","Weedle","Kakuna","Beedrill",
"Pidgey","Pidgeotto","Pidgeot","Rattata","Raticate","Spearow","Fearow","Ekans","Arbok","Pikachu","Raichu","Sandshrew","Sandslash","Nidoran","Nidorina","Nidoqueen","Nidorino","Nidoking",
"Clefairy","Clefable","Vulpix","Ninetales","Jigglypuff","Wigglytuff","Zubat","Golbat","Oddish","Gloom","Vileplume","Paras","Parasect","Venonat","Venomoth","Diglett","Dugtrio","Meowth","Persian",
"Psyduck","Golduck","Mankey","Primeape","Growlithe","Arcanine","Poliwag","Poliwhirl","Poliwrath","Abra","Kadabra","Alakazam","Machop","Machoke","Machamp","Bellsprout","Weepinbell","Victreebel","Tentacool",
"Tentacruel","Geodude","Graveler","Golem","Ponyta","Rapidash","Slowpoke","Slowbro","Magnemite","Magneton","Farfetch’d","Doduo","Dodrio","Seel","Dewgong","Grimer","Muk","Shellder","Cloyster","Gastly","Haunter",
"Gengar","Onix","Drowzee","Hypno","Krabby","Kingler","Voltorb","Electrode","Exeggcute","Exeggutor","Cubone","Marowak","Hitmonlee","Hitmonchan","Lickitung","Koffing","Weezing","Rhyhorn","Rhydon","Chansey","Tangela",
"Kangaskhan","Horsea","Seadra","Goldeen","Seaking","Staryu","Starmie","Mr.Mime","Scyther","Jynx","Electabuzz","Magmar","Pinsir","Tauros","Magikarp","Gyarados","Lapras","Ditto"
,"Eevee","Vaporeon","Jolteon","Flareon","Porygon","Omanyte","Omastar","Kabuto","Kabutops","Aerodactyl","Snorlax","Articuno","Zapdos","Moltres","Dratini","Dragonair","Dragonite","Mewtwo","Mew")
template_1 = rep("https://pokemondb.net/pokedex/",150)
template_2 = rep("/moves/1",150)
pokemon_websites = data.frame(template_1, names, template_2)
pokemon_websites$full_website = paste(pokemon_websites$template_1, pokemon_websites$names, pokemon_websites$template_2)
Next, I remove all spaces:
library(stringr)
pokemon_websites$full_website = str_remove_all( pokemon_websites$full_website," ")
Now, I have a column with all the website names:
head(pokemon_websites)
template_1 names template_2 full_website
1 https://pokemondb.net/pokedex/ Bulbasaur /moves/1 https://pokemondb.net/pokedex/Bulbasaur/moves/1
2 https://pokemondb.net/pokedex/ Ivysaur /moves/1 https://pokemondb.net/pokedex/Ivysaur/moves/1
3 https://pokemondb.net/pokedex/ Venusaur /moves/1 https://pokemondb.net/pokedex/Venusaur/moves/1
4 https://pokemondb.net/pokedex/ Charmander /moves/1 https://pokemondb.net/pokedex/Charmander/moves/1
5 https://pokemondb.net/pokedex/ Charmeleon /moves/1 https://pokemondb.net/pokedex/Charmeleon/moves/1
6 https://pokemondb.net/pokedex/ Charizard /moves/1 https://pokemondb.net/pokedex/Charizard/moves/1
I would like to count the number of moves each of these 150 Pokemon can learn. For example, the first Pokemon "Bulbasaur" can learn 24 moves:
In the end, I would like to add a column to the earlier data frame that contains the number of moves each Pokemon can learn. For example, something that looks like this:
> head(pokemon_websites)
template_1 names template_2 full_website number_of_moves
1 https://pokemondb.net/pokedex/ Bulbasaur /moves/1 https://pokemondb.net/pokedex/Bulbasaur/moves/1 24
2 https://pokemondb.net/pokedex/ Ivysaur /moves/1 https://pokemondb.net/pokedex/Ivysaur/moves/1 ???
3 https://pokemondb.net/pokedex/ Venusaur /moves/1 https://pokemondb.net/pokedex/Venusaur/moves/1 ???
4 https://pokemondb.net/pokedex/ Charmander /moves/1 https://pokemondb.net/pokedex/Charmander/moves/1 ???
5 https://pokemondb.net/pokedex/ Charmeleon /moves/1 https://pokemondb.net/pokedex/Charmeleon/moves/1 ???
6 https://pokemondb.net/pokedex/ Charizard /moves/1 https://pokemondb.net/pokedex/Charizard/moves/1 ???
- Is there a way to webscrape this data in R, count the number of moves for each of the 150 Pokemon, and then place this move count into a column?
Right now I am doing this by hand and it is taking a long time! Also, I have heard some websites do not allow for automated webscraping - if this website (https://pokemondb.net/pokedex/game/red-blue-yellow) does not allow webscraping, I can try to find another website that might allow it.
Thank you!
Solution
You can scrape all the tables for each of the pokemen using something like this:
tables =lapply(pokemon_websites$full_website,function(link) {
tryCatch(
read_html(link) %>% html_nodes("table") %>% html_table(),
error = function(e) {}, warning=function(w) {}
)
})
However, note that the number of tables returned differs for each of the pokemon. For example the first has 6 tables - the first three of those are for Red/Blue, the second three of those are for Yellow.
lengths(tables)
[1] 6 6 6 6 6 6 6 6 6 2 4 7 2 4 8 6 6 6 4 4 6 6 6 6 6 8 6 6 0 4 8 4 8 6 8 4 6 6 8 4 4 6 6 8 6 6 5 5 5 5 4 4 6 6 6
[56] 6 4 6 6 6 8 6 6 6 6 6 6 6 6 8 6 6 6 6 6 4 4 6 6 6 6 0 6 6 6 6 4 4 6 8 4 4 6 6 6 6 6 6 6 6 4 8 6 7 6 6 6 4 4 6
[111] 6 6 6 6 6 6 6 6 6 8 0 6 4 6 6 6 6 2 8 6 2 4 8 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
Answered By - langtang

0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.