I am Marcel Beugelsdijk from The Netherlands
In this video we will take a closer look at the functionality of importing and combining
files from a folder since the November 2016 upgrade of Power BI Desktop.
It's quite a long video. Below the video you will find links to the various subjects so you can
jump back and forth as you like.
Subjects covered:
Explanation of the used case: so we will combine Excel files from a folder
that includes clock changes for each Windows time zone in the period 2000 to 2027.
So the date/times in any time zone can be converted to date/times in any other time zone.
A quick demo of importing and combining files from a folder.
Explanation of the objects and query steps created automatically when files are combined.
A specific point of attention for which I raised an idea for improvement.
Adjustment of generated code to filter input for a specific year only.
First without parameters and then with parameters.
And there is a specific point of attention as well.
So, let's give it a start.
In this folder, I have a collection of Excel files,
with date and time stamps the clock is changing between the year 2000 and 2027
for each Windows time zone.
Let's take a look at one of the files.
It includes the UTC time stamp starting with January first, 2000
and the local time - "Lokaal" is Dutch for "local" - for this time zone it's also midnight at
January first, 2000.
And then for each change of clocks, so typically that would be Daylight Saving Time
start and end, or just a change in policy to adjust clocks, there will be an entry in the list.
This is created from a Power Query query and the files for each time zone are the same.
Basically what I did was create the first file, have the query run, then save it,
then adjust the time zone on my local computer, then run the query again,
and save it with the name of the new time zone and so forth and so forth for 130 time zones.
Each file has the same structure, which is important,
and all output is in a table named TimeTable
You also should notice that the name of the Windows time zone is not in the file,
not in the data, but it is included in the name of the file, so if we want to combine the files,
with all the time stamps for all the Windows time zones,
then we also need the name of the file.
Montevideo on the third of February, 2020, 5 minutes to 5,
it will be already fourth of February 2020, 5 minutes to 7 in Bougainville,
So with the combined data it is possible to calculate the time in one time zone
based on the time in another time zone.
Now, I've come to the quick demo on importing and combining files from a folder.
So, I have a new file in Power BI, I select Get Data, More...
from a folder.
I use the folder over here, so I copy the path...
...paste it over here.
OK
Edit
I need to do some filtering on Excel files
And they should start with "UTC"
Open parenthesis UTC, to be exact.
So OK, now we have all the files that we need,
And we can now combine those files with the new functionality, if we press the button,
then we need to select which part of the Excel file we need, well we need the TimeTables,
the tables "TimeTable"
When we press OK...
...there we have the result we want, with the file names in the first column,
the UTC time in the second column, and the third column is the local time for that time zone
at the specified UTC time.
So far the quick demo.
We are now at the explanation of the objects and query steps that are created automatically
when files are combined from a folder.
We were over here before we pressed "Combine Binaries".
4 steps are added to the query, but - in order to understand those steps -
we first have to take a look at the objects that were created at the left hand side.
These are all grouped now, so our own query is over here,
In the group "Other Queries".
And a group "Transform Binary from DateTimeTables" was added automatically.
So the functionality of combining files consists of a sample file
with a sample query that is wrapped into a function that can be used for the other files.
So what we see are the objects over here:
First we have the Sample Binary that is the example file.
This query consists of the query steps we already had
and - if we take a look at the advanced editor -
well, this is the new step that is added:
basically it states that we use the Casablanca file as an example file.
In this query - that was also automatically created - we can see the source:
these are the Excel objects in the Casablanca file.
and the navigation is what we specified on the popup screen in the short demo:
this is the navigation to the item "Table" of the kind "Table" in the file for Casablanca.
So this will be used as an example for all the other files as well.
For this goal, this query - if we take a look at the advanced editor again - let's copy, cancel
This part is wrapped into a function for general use.
So if we go into the editor then we will see a message popping up:
stating that this function is automatically updated whenever that sample query is updated,
So that's fine for now (I won't update it, except if I paste the code we had from the sample query)
then we can see it's exactly this part of the function,
so this code is wrapped into a function with the parameter: the file contents.
So the name of that parameter is "Sample Binary Parameter".
If I go back to the objects, then we can see this parameter
which is pointing to the Sample Binary file.
So again: in the example query of the sample file, we can see
the "Source" where the parameter is used.
This parameter is used to point to the Sample Binary file,
which is the file for Casablanca, which is the first file in the folder.
And that is used in the function...
...also as a parameter, but now you should notice, that
this parameter is the parameter of the function,
so input is whatever content is supplied to the function,
That is used to convert the Excel file binary into a table in Power Query.
So this parameter is the LOCAL parameter for the function, whereas in the query...
...the parameter that is used, is not the local parameter,
but that is THIS parameter pointing to the sample file,
So again: the Sample Binary is used as an example;
the parameter is used so the exact contents of this sample query
can be included in this function.
Whenever we adjust something in the sample query, it will also be adjusted in the function.
So these are the objects that are created.
Now we can take a look at the additional steps that were added in our query.
So the first one is that the function is invoked.
That is the custom function that was created by the software
when we pressed "Combine Binaries".
Well, a new column is added to our table.
The query that is called, is this transform query.
And the parameter for that query now is the content
meaning each of the binary files from our folder.
If we go to the right in the table,
then we will see all the tables over here as a result from the "Invoke Custom Function" step
in our query, so this is the file for Casablanca,
this is another file and so on, each table, based on the sample query, using that function.
The next step is a rename of the " Name" column.
It is renamed to "Source.Name" to avoid any conflicts with a column with the name "Name"
in the Excel files, when the table is expanded later on.
So that's the only reason that this step is included.
Then, there is another step that removes all the other columns from the navigation table,
so we are left with the file name and the tables, exactly what we want in our case,
but if you would like to have some more columns,
then you can still choose the additional columns from the navigation table
like the "Date Modified"...
...and it will be included in the results of the combining binaries,
which is a huge improvement with the functionality that was introduced
in November 2016.
But we only need the Source.Name and the contents of the binary.
And the last step of course, is: the table will be expanded.
So again - as a wrap up - when we press "Combine Binaries" -
a sample file is created, with a parameter pointing to that sample file,
a query is created for the conversion of the file into a Power Query table.
That example query is wrapped into a function that is used for all the other files in the folder,
to have that converted to a Power Query table.
And then, there is some rename and selection of columns and expanding the table.
So basically, that's the functionality introduced with Power BI Desktop, November 2016 update.
We are now at the specific point of attention for which I raised an idea for improvement.
So let's take a look.
In the Sample Binary there is a hard coded link to the Casablanca file in the folder.
If that file would no longer exist, for instance we rename it with an additional parenthesis,
If we update, refresh, this query, so our master query,
it will still go fine.
But if we want to do something with this query, and have it refreshed,
then we get an error because the file doesn't exist any longer.
So, in order to solve that, I raised an Idea
bascially to have...
...this hardcoded line, replaced by...
...another code. Basically what is done is: the first row of this table, so the last step before
combining binaries, is taken as an example.
So all we need is...
... that file ...
... and instead of this double key, so the folder path and the file name,
we only use zero, pointing to the first row in the table
and then the content of that table
Adjust this name: SampleBinary.
And now we have a dynamic sample binary pointing to the first file in the folder,
whichever selection we have done.
So we will not be dependent anymore of the existence of a specific file in the folder.
So you can see I raised this Idea
to have the "combine binaries" functionality independent
from the existence of a particular file.
At this moment, the Idea has 5 votes already, but if you want to vote also,
then please go ahead.
We are now at the part of the video
where we are going to adjust the generated code to filter the input for specific years only.
We will adjust this query
and remember: all the adjustments we apply to this query
will be propagated to this function as well, as we will see.
So what we wanna to do, is get all the entries starting with a specific year,
so let's say we want to have the years 2016 and 2017.
We also need the last row before that date, so we should start with the last date on or before
January first, 2016, and then select everything including these entries,
so we have the years 2016 and 2017.
So now we have the desired data, so we can
cover the years 2016 and 2017 from this selection.
If we take a look at the function: it is indeed adjusted
with our adjustments of the sample query.
So these selections will now be applied to all the tables from the folder we selected.
We can see it's refreshed already, so this is the desired output.
Now if we want to use a parameter for that,
so instead of hardcoded start dates and end dates, over here and over here,
we want to have a parameter.
Now we are going to use these parameters in the example query.
2016 is the first year.
LastYear instead of hardcoded 2017.
So this is working fine.
But if we go now to the master query, there is a point of attention:
We get an expression error.
Because: 2 parameters are added to our function.
The function is also adjusted, and now includes those additional parameters
FirstYear and LastYear.
So when we use parameters in our adjusted sample query
these will be propagated to the function as well
but not to the code in our master query
so in this code, we need to add the new parameters FirstYear and LastYear.
We can not do that over here, as you can see, because it expects a decimal number.
So instead of doing it over there, we have to do it over here.
And it's working fine now.
So now we have the years 2016 and 2018.
And when we adjust this year
Last year for instance 2020
then we will get additional data in our table.
So beware of adding parameters in your sample query:
these will be propogated to the function, but not to your master query
and in order to have that corrected, you need to go into the function bar
and add the names of the parameters that were added to the function.
Then everything will be working fine.
As a last example: if we want to adjust the data type for these columns in our sample query,
to datetime...
...and we take a look at
at these columns, then we see that the date type was not propagated to all the files.
in the folder. So we have to reapply that in the final result.
Because the example query is used to create all these tables,
but later on these tables get expanded and then apparently the data type gets lost
so it seems to be pretty useless to adjust the data type in the example query.
That will be the end of the video. Thanks for your attention!
No comments:
Post a Comment