Day to Data Stuff

View Original

Creating Loops in M

Hi everyone!

Today we’ll be looking at how you can make a loop in M in order to have dynamic queries and handle the problems PowerBI will have it. Although this can be pretty easy in other software, in PowerBI loops can be a little tricky. However, once you understand the basic structure of how they work, it’s a super useful tool to have at your disposal!

While there are quite a few blogs out there that have covered this topic, I wanted to cover it in a really simple way with a nice tutorial so you can take this logic and apply it to any challenges you might be facing!


Use Case

So our use case is that we want to get some data from a UK government open API. We can use our web connector in PowerBI to make that connection and draw back the data. Let’s take a look.

If anybody recognises the API and search term being used here, you read as many PowerBI blogs as me, and should potentially consider getting out more!

So you can see from this really simple query we are trying to retrieve a JSON from

https://data.gov.uk/api/3/action/package_search

You’ll also notice that we have appended a little bit extra on the end of the URL, namely a ? followed by q=cows. Those familiar with URLs will know already, that we can append additional criteria and headers to a URL by adding terms after our base, or root URL with the ?.

So basically, the M code we have here hits an open API which retrieves metadata for all government datasets, but we’ve added a specific search term of cows. So now the data returned will be a list of all available datasets containing cows (actually the API only returns 10 rows by default, so just the first 10, but if we wanted more, adding “&rows=100” to the end of our URL would let us view all the data (there are 76), but we don’t need to do that for this exercise).

After the transformations you see in the M code above, this is the result. I dropped everything except the title and some notes, as we don’t actually really need this data! It was upsetting to see so few datasets are actually about cows though. From here we could manipulate the data and do whatever we want with it. Great.

Except, I also want to know what datasets the government has on sheep. I’m looking to collect a whole barnyard of data.

Referring back to our query, the search parameter is hardcoded into the text. So, I wouldn’t be able to change the results or that query to include sheep as well as cows. (Perhaps the API would accept an OR statement for the search, perhaps not - and how many search terms could I feed? As you’ll see, this might be a solution if you only have 1 or 2 terms, but with more, this approach definitely won’t work!)

So one thing I can do is create a second query just like the first but change the search term.

OK, so now I can load that in and the data returned will be the exact same but using the search term sheep instead. I now have 2 queries, which I could merge/append together

The sheep data actually seems a bit grim

And my farmyard grows!

I could do this for all of my desired search terms and just append them all together at the end. The downside to this is I manually have duplicate the query and change the code each time. For 1 or 2 terms this might be OK, but to put this into a real world use case, what if you needed to make 100 API calls, each slightly different? It’s going to be a real pain to setup.

This is why we might want to use a loop


Looping in M

If you navigated to this page by googling because you already have a use case and want a solution - sorry for making you read that first section - this is the good bit!

With this kind of repetitive task, we can actually make the computer do the work, by making the computer run several iterations of a procedure until a condition is satisfied. That is called a loop and we can set up a loop in PowerQuery to automate the authoring of my URLs.

The first step is to create a list of all the search terms I want. In this case, I just manually typed in a list, but you could use any kind of data for this.

So, I now have a table of the five search terms I want to iterate. We’ll come back to this in a moment.
Next what we need to do is create a function. A function in M is defined by Microsoft as;

In the Power Query M formula language, a function is a mapping from a set of input values to a single output value. A function is written by first naming the function parameters, and then providing an expression to compute the result of the function.

In other words, a function is much like a query, except part of the query is a parameter, and we can feed a value to a function and it will produce some output based on the value of the parameter we feed it.

To put this in real world terms, a function could be the M code I wrote initially, but where instead of having a hard-coded term like cows, we instead feed a parameter into the code. Which is exactly what we are going to do.

So we will setup a parameter - it doesn’t actually matter what you put in this, we just need it so we can amend our M code.

Here, I’ve just made a blank parameter, named it SearchTerm.

Now, going back to our initial query, we can substitute the hardcoded text for our new SearchTerm parameter.

I split the important bit out so it should be clear to see, all we’ve done is remove “cows” from our original URL and have instead used the & symbol to append our parameter to the URL.
Then we can transform this query into a function - simply right click your query and choose Create Function

It’ll ask you to name your function and then it will appear as a function in your list of queries on the left.


If you view your function in PowerQuery, you’ll see you have the option to give it an optional search term and then you can invoke it. If you do, it’ll run the function using whatever search term you give it. This is already pretty cool, but it gets better!

We are now ready - Let’s go back to that initial list

This is the magic bit! If you go to Add Column while viewing the list (as per the screenshot above) and go to Add Column > Invoke Custom Function you will get the menu you’ll see above.

What we are doing, is adding a new column to our dataset. The new column will be the output of the function we assign it. In the example above, you can see you must name the new column, and decide what function you want to invoke. In this case, my LoopQuery is the function we just built.

Here’s the cool bit - the third option asks you to, optionally, select what value you would like to use for the SearchTerm parameter. If you recall, the function needs a value for our SearchTerm parameter in order to know what search term to append to the URL.

In this example we will feed it the “Animals” column. What this will do is for each row the function will evaluate using the value in the animal column as input for the SearchTerm parameter (hope this still makes sense!!).

If we apply that function and then expand the column of Table records it returns you can see we get the desired result

As you can see the initial list will iterate over each row generated by the function, which is actually quite handy as a way of being able to filter your data later.

Additionally, the function also contains the M code to transform and shape the data (i.e. dropping all columns except name/notes) so it will apply those steps each time it iterates.


And that’s it - you can now write loops for M!

This was just a small example, but you could feed hundreds of values into that list and have the function iterate as many times as you need (subject to the API allowing you!).

However, there is one more problem you should be aware of.


Hand Authored Queries

I don’t mind too much about refreshing my barnyard data, but in the real world, you probably do. PowerBI is not a big fan of dynamic queries, and if you set up the example I gave and go to your data sources, you’ll see this warning -

The first time I saw this, I had no idea what it meant, nor the potential ramifications (Note the data source in the screenshot is a normal working one - the problem data source isn’t on that list and you can’t interact with it, such as setting authentication etc.).

Based on my research / best guess, PowerBI has a problem with our dynamic URLs. I suspect the issue is something to do with PowerBI trying to validate the URL which it does using the base url feed to it - in other words it doesn’t run the full query, so it doesn’t get the append, but perhaps it does realise it’s missing something. I had a similar issue in a project I worked on, whereby the PowerBI gateway test connection was getting tripped up by the same issue - hence my guess as to the cause of the issue, but if anyone knows better I’d love to hear from you!

So what does this warning actually mean for us? It means PowerBI doesn’t really recognise our data source as being a legitimate one. As a result you can’t set credentials for it and you won’t be able to setup a scheduled refresh in the service. This is not ideal.

Fortunately we have a really helpful function - RelativePath.

Now, Chris Webb covered the RelativePath function on his excellent blog, but even the microsoft documentation gives us a good clue as to why it might be useful.

RelativePath: Specifying this value as text appends it to the base URL before making the request.

What it seems to do in practice is append some text to a base URL before whatever checks PowerBI is running that is creating the error.

I had to tweak my function a bit so instead of “cows”, for example, it is now “?q=cows”. Then we simply remove everything from the base URL in our query and then just wrap the SearchTerm parameter in a RelativePath[].

Although functionally, this doesn’t change our query - it will perform exactly the same, it does mean PowerBI now won’t confuse itself with a dynamic URL.

As I already mentioned, I’m afraid I can’t tell you exactly why this works as I’m not sure myself - I suspect it is to do with how PowerBI is validating the URLs.

I can tell you though, RelativePath does fix this problem as once the dynamic portion of the query is wrapped up in a RelativePath the handauthored warning will disappear and you can use it as a normal datasource, setup your refresh and whatever else you want to do.


As ever, I hope you found this helpful.

If anyone knows more about the hand-authored queries please let me know, otherwise, have fun making some loops!