Power Query (M) pulling multiple pages


#1

This PowerQuery/M code automatically grabs multiple pages of dreamfactory JSON recursively. The column will still need to be expanded, etc, as usual.

Improvements welcome, I’m by no means a powerquery expert, but I wanted to get dreamfactory working with PowerBI. While this is tailored to dreamfactory, it should work with any paged api.

PowerBI is a data visualization application developed by Microsoft: https://powerbi.microsoft.com/en-us/ . The PowerQuery language is also used in Excel.

let getDFUrl = (url as text, optional previousResult) =>
        let
			Res = Json.Document(Web.Contents(url)),
			offset = 
                            try Res[meta][next]
                            otherwise 0,
			combination = if (previousResult <> null) then
				Table.Combine({
					previousResult,
					Table.FromList(Res[resource], Splitter.SplitByNothing(), null, null, ExtraValues.Error)
				})
			else
				Table.FromList(Res[resource], Splitter.SplitByNothing(), null, null, ExtraValues.Error),
			
			Result = 
                            if (offset <> null) then
                                if (offset > 0) then 
					@getDFUrl(url & "&offset=" & Number.ToText(offset), combination) 
				else 
					combination
                            else
                                "nothing..."
        in Result
in getDFUrl("your_dreamfactory_url_with_filters_and_api_key_and_stuff")

This query seems to have issues in Excel at the moment, but PowerBI Desktop is working fine. Note that PowerBI Webservice can’t use recursive functions to build URLs at the moment (April 2017), so you’ll have to “unroll the loop” and hardcode all your URLs if you need to connect to the API via the PowerBI Web Service Scheduled Update.


#2

@jraiche This is awesome! Thanks for sharing!


#3

Here’s some javascript that generates the unrolled loop. Note that for long queries (10s of thousands of records) powerbi may hang or bluescreen your computer. Getting millions of DF rows may take a very long time. This code will work in the web service, but I speculate it is non-optimal. Or the PowerBI web api grabber is terribly inefficient. Where possible, use the recursive code in the original post, but if you need refreshes in the web service, this is the best way I’ve found.

To use, open up Dev Tools in Chrome (F12). Open the console and paste this code in. Then change the count to the number of pages you want (Dreamfactory defaults to 1000 records per page). Change the URL to the url to get the data, with filters, etc. Nothing will appear to happen, but the result will be on your clipboard.

var count = 10; //number of pages
var url = "http://YOUR_DF_URL/?with_filters=and_api_key&etc=etc";
var output = "let \n";
for (var i=0; i<count; i++) {
	output += `
		source${i} = Json.Document(Web.Contents("${url}&offset=${i * 1000}")),
		resource${i} = source${i}[resource],
		table${i} = Table.FromList(resource0, Splitter.SplitByNothing(), null, null, ExtraValues.Error), \n
	`;
}

output += "		FinalSource = Table.Combine({\ntable0\n";
for (var i = 1; i<count; i++) {
	output += ",table" + i + "\n";
}

output += `}) 
in FinalSource
`;
copy(output) // paste this into the advanced query editor (right click your query name to access the advanced editor).

This generates code like this:

let
	source0 = Json.Document(Web.Contents("https://vtransapi.aot.state.vt.us/api/v2/crash/_table/pub.PQT_AccidentHeader?api_key=e17ab0a9946ee4e2921a8c80b4949ba81ef897cb03e84a032edd3ecd2925bbb5&fields=REPORTNUMBER,ContribCircumstances1,Light,RoadType,RoadAlign,RoadDesign&limit=1000&offset=0")),
	resource0 = source0[resource],
	table0 = Table.FromList(resource0, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
	
	source1 = Json.Document(Web.Contents("https://vtransapi.aot.state.vt.us/api/v2/crash/_table/pub.PQT_AccidentHeader?api_key=e17ab0a9946ee4e2921a8c80b4949ba81ef897cb03e84a032edd3ecd2925bbb5&fields=REPORTNUMBER,ContribCircumstances1,Light,RoadType,RoadAlign,RoadDesign&limit=1000&offset=1000")),
	resource1 = source1[resource],
	table1 = Table.FromList(resource1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
	
	source2 = Json.Document(Web.Contents("https://vtransapi.aot.state.vt.us/api/v2/crash/_table/pub.PQT_AccidentHeader?api_key=e17ab0a9946ee4e2921a8c80b4949ba81ef897cb03e84a032edd3ecd2925bbb5&fields=REPORTNUMBER,ContribCircumstances1,Light,RoadType,RoadAlign,RoadDesign&limit=1000&offset=2000")),
	resource2 = source2[resource],
	table2 = Table.FromList(resource2, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
	
	source3 = Json.Document(Web.Contents("https://vtransapi.aot.state.vt.us/api/v2/crash/_table/pub.PQT_AccidentHeader?api_key=e17ab0a9946ee4e2921a8c80b4949ba81ef897cb03e84a032edd3ecd2925bbb5&fields=REPORTNUMBER,ContribCircumstances1,Light,RoadType,RoadAlign,RoadDesign&limit=1000&offset=3000")),
	resource3 = source3[resource],
	table3 = Table.FromList(resource3, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
	
	source4 = Json.Document(Web.Contents("https://vtransapi.aot.state.vt.us/api/v2/crash/_table/pub.PQT_AccidentHeader?api_key=e17ab0a9946ee4e2921a8c80b4949ba81ef897cb03e84a032edd3ecd2925bbb5&fields=REPORTNUMBER,ContribCircumstances1,Light,RoadType,RoadAlign,RoadDesign&limit=1000&offset=4000")),
	resource4 = source4[resource],
	table4 = Table.FromList(resource4, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
[...]
	
FinalSource = Table.Combine({table0,
	table1,
	table2,
	table3,
	table4,
	[...]
})
in
FinalSource

#4

There are ways to change the maximum page size in dreamfactory, so here’s another version of the PowerQuery generation script with a variety of optimizations. I’ve found CSVs to be more performant and made the page size and total number of records more flexible. Again, this is Javascript suitable for running in Chrome that generates valid M code for paged PowerBI/PowerQuery calls to a DreamFactory API as an unrolled loop that can be run in the PowerBI web service.

var i = 0,
	out1= "let ",
	out2 = 'FinalSource = Table.Combine({',
	pageSize = 100000,
	total=1000000,
	url="http://YOUR_DF_URL/?with_filters=and_api_key&etc=etc"
	
for (var i = 0; i<total/pageSize; i++){
	out1 += `\n	Source${i} = Csv.Document(Web.Contents("${url}&accept=text/csv&limit=${pageSize}&offset=${i*pageSize}")),
	PromotedSource${i} = Table.PromoteHeaders(Source${i}, [PromoteAllScalars=true]), 
	`;
	
	out2 += `PromotedSource${i},
	`;
}

out2 = out2.substring(0, out2.length - 3) + "}) \nin FinalSource";
copy(out1 + "\n" + out2);