Learn Excel – Scrape Webpages Using Power Query – Podcast 2056

Learn Excel – Scrape Webpages Using Power Query – Podcast 2056


Learn Excel from MrExcel Podcast, Episode
2056: Power Query Squared Hey, welcome back to the MrExcel netcast,
I’m Bill Jelen. I came across this trick while I was preparing
to do a seminar at a conference in Dallas called Excelapalooza, greatest Excel conference
name ever. You should check it out every September in
Dallas. And credit to Ken Puls and Miguel Escobar
because I had to do an hour on power query, so of course, I pulled out there awesome book,
the world’s greatest book on Power Query. I was flipping through the book and I saw
they had one technique and I said, “Wait a second. I’m going to see if I can adapt this technique.” And this to me is awesome in power query. And here’s what we’re going to do, we’re going
to take one power query and then we’re going to run that power query, that query, dozens
of times, alright? And so, the example that I came up with was
where I wanted to pull data from a web page, alright. And I went out to – I just looked for some
web pages I could- that I could use as an example. I ended up at Weather Underground and here’s
the URL, and you can see that I was in Dallas-Fort Worth, so we’re pulling data for Dallas and
it looks like January 2nd, 2015. So right there in the URL is the parameters,
right? And that is a URL that is just ripe for making
this thing work. We’ll take a quick look at the web page although
it’s not so important. You see there’s a lot of different data out
here on the web page and I decided I was just going to try and get precipitation and high
and low temperature. And here’s that query. And let me tell you right here that this podcast
is not about how to Create this Query, New Query, From Other Sources, From Web, specify
the URL and then a bunch of steps that I’m not going to detail here to get my final answer
of Max Temp, Min Temp, and precipitation. The point is you’re going to take your own
query and make it work for a bunch of things. So, I click Close & Load and this query is
working, it’s returning my one row. Everything is awesome. And I’m going to come back in, I’m going to
Edit this query and I’m going to go to View, Advanced Editor. I’m going to take this query and I’m going
to make it be a function, alright? So right here before the word LET, I press
Enter. And in parentheses, I’m going to give it a
variable (MyURL) and then=>little arrow there, alright? Cool. And then down here, where they have the URL
in quotes, I want to get rid of the entire URL including the quotes and then type my
variable name MyURL, alright. So what we’re saying is, we’re going to pass
it to URL and it’s going to do that same query but with whatever URL we happen to pass it. Now, a couple of disconcerting things here,
when I click Done, oh men! All my applied steps are gone and they want
me to enter a parameter. Just ignore all that. We’re going to rename this; we’re going to
call it fxWeather. FX, of course, being the abbreviation for
function and you have to really remember this name and remember which letters are capitalized,
that’s going to be very important in a couple of minutes. Home, Close & Load, and BAM! Everything is gone. Oh, no! But that’s okay. Alright, so we know it’s there. It’s a connection only. Now, I’m going to come over here and this
is just straight old Excel, alright? So here’s the URL, I broke it out into the
first part of the URL, the ending part of the URL. I know that I need to take the date; I need
to format it in this weird format of year, month, and day so I used the TEXT function
to do that. Put in the starting date here. I can even change the airport, so now I’m
back and forth. Let’s do MCO for Orlando and let’s do some
recent data. So I’ll start at 10/1/2016, alright. So now we have this great little table set
up here. And by the way, it does have to be a table. You have to use format as table or Ctrl+T.
So, you know, this is just grabbing that date and then +1+1+1. I format it, I build the URL. Alright now, we’re going to build a query
from this table. Alright, and there’s my information. I’m going to Add a New Column, Add a Custom
Column, the columns were going to be called Weather, and the formula is going to be=fxWeather. Make sure it’s the exact same case, same upper
and lower case letters, and we’ll insert that field called URL like that, closing parenthesis. No syntax errors, click OK. They want to know about Privacy here, this
is all Public data, click Save, alright. So, there’s our date. It’s funny they’ve actually changed my format
into something that doesn’t look like what I started out with. And then here’s Weather with the expand symbol. So I’m going to click the expand symbol, uncheck
Use original column name as prefix. I want the Max, the Min, the Precipitation,
click OK. Alright, and now all I need is the date and
that information out there. So I’ll right-click and remove this column,
right-click and remove this column. Over here, I don’t need that time so I’ll
say this is just a Date, alright. And see what it’s doing every date that I’m
passing it; it’s returning the high, the low, and the precipitation for Orlando. Every row here is going out to a different
web page. Just imagine, if it wasn’t 15 rows but 5,000
rows, you’d set it up to run overnight. I used to write macros for this. In fact, one of the web pages out at MrExcel
is how to build a macro to scrape web pages from a thousand different web pages at a site,
not necessary anymore with power query. Now, when I Close & Load, it’s funny the preview
here is showing me all the results. When I Close & Load, they’re actually going
to go do each query. And so right now we have preview rows loaded
and it will take a good long time for this information to get updated. So, go to lunch, do something especially if
you’re doing more than 15 rows. And it’s funny the preview is correct but
they’re still going out and could chunk, could chunk, could chunking through each individual
row. And there it’s loaded. Is this amazing or what? Hey, I do a lot of Excel seminars, the Power
Excel seminar in Orlando, Florida. Look at these beautiful temperatures we have
down here on November 4, 2016. My afternoon will be all about Power Query,
Power BI, Power Pivot, Power Map. So, I’m going to invite you down to Orlando
to check out this seminar. There’ll be a link there in the top-right
hand corner. Alright, so recap. Today’s trick is from this book, M is for
(DATA) MONKEY. We built a query to get one web page and then
edit that query to change it into a function. So right before the Let statement variable
name=>and then change the hard-coded URL to whatever that variable name is. Rename the query to fxWeather, Close & Load,
the data disappears. Then, we use some sort of Excel trickery to
create a table of all the URLs we want to crawl, create a query from that table. This has to be a Ctrl+T table, add a new column
of Weather=fxWeather and again it has to match the case there, [URL], Expand that column,
uncheck Prefix. BAM! It is amazing. Thanks to Ken and Miguel for writing this
book. Thanks to you for stopping by. Hope to see you in Orlando on November 4th
2016. See you next time for another netcast from
MrExcel.

Author: Kevin Mason

16 thoughts on “Learn Excel – Scrape Webpages Using Power Query – Podcast 2056

  1. In the words of the great Mike "ExcelIsFun" Girvin – "You have GOT to be kidding me!" This was an amazing video – thank you!

  2. question for you, i have a refresh button on my excel sheet that retrieves quotes from a internet data source, sometimes when i click the button the query box pops up, i have .refresh backgroundquery:=false, it still pops up, do i have the code correctly or is there something i can do to hide the query box from popping up and letting others see my vba

  3. This is very powerful. However I sometimes get a Power Query error "Operation is not valid due to the state of the object" (for whatever that's worth!)

  4. Thanks for this video. When I follow the instructions you provided for adding the column in Power Query, at 4:06, the resulting column is full of "error" values. When I drill down, I see the following error message:

    An error occurred in the ‘’ query. Expression.Error: The column 'Temperature (° F)' of the table wasn't found.
    Details:
    Temperature (° F)

    When I initially pulled in the data from the table from Weather Underground, I went to Power Query and removed the Temperature Column, or so I thought. Maybe I renamed it. In any case, I followed the steps exactly for adding the column into the query created from the daily weather data table. I'm not sure what I'm doing wrong.

  5. Thanks great video as always — have you done any examples with "Scraping" data from Website that are note using tables? Cam you still use M / Power Query or do you have to use R-Script or similar?

  6. Thanks for this its fantastic just subscribed. one question,
    It works perfect when the table header of the original query is exactly the same as the new table in the different URL. My problem is the example I'm trying to do, the second URL sometimes changes the headers slightly depending on the site. Therefore when i run the query it comes up with the "[Expression.Error] The column 'xxxxxx' of the table wasn't found".

    I was wondering if there was was a way to have Power query search for a second header if there is an error on the first.

    It would be great if you could do a vid on this thanks

  7. Olá !

    Gostaria de aprender webScraping de maneira profissional usando Excel?

    E o melhor sem necessidade de usar navegador, usando direto o protocolo HTTP?

    Conheça o curso de WebScraping com Excel VBA.

    Veja o vídeo no link abaixo.

    https://www.youtube.com/watch?v=CpbhAra6xrE

    Abraços

  8. This is great, is there a way to combine 2 different queries into one table? For example there are 2 separate tables on each page that i want to extract data from. i Have been able to suceed using you method to get 1 table extracted over multple pages. (is there a way to join the second table?

  9. Unexptected (,/'_'.)
    https://chrome.google.com/webstore/detail/threelly-ai-for-youtube/dfohlnjmjiipcppekkbhbabjbnikkibo

Leave a Reply

Your email address will not be published. Required fields are marked *