Importing API data into DataBricks for the purpose of exporting to SQL

I need to get our business sales data from third-party APIs and save this into a SQL database for reporting purposes. Unfortunately the APIs don’t support bulk export so getting the data can only be achieved by nested requests. These APIs are provided by a third-party to which we have no control over.

I have already imported the customers from an API into a SQL database.

The first API is the orders API. This API is passed a customer number and it will return all the orders for that customer. It is called like this:

https://api.example.com/v3/orders/{customerNumber}/

Using customer number 3052 as an example, the JSON response would look like this:

[
    {
        "order": "66250",
        "customer": "3052"
    },
    {
        "order": "66690",
        "customer": "3052"
    }
]

The second API is the order details API. This API is passed a customer number along with the order number and it will return the specific order details. It is called like this:

https://api.example.com/v3/ordersDetails/{customerNumber}/{orderNumber}/

Using customer number 3052 and order 66250 as an example, the JSON response looks like this:

{
    "order": "66250",
    "salesDate": "2020-02-03 14:15:44",
    "totalItems": 1,
    "Status": "Pending"
}

I can only call the first API by looping through the customer numbers. Likewise we can only call the second API by looping through the customer numbers and order numbers. As we have tens of thousands of customers, with between 1 and 30 orders each, that’s a lot of looping and calls to the API. Furthermore, we need to update this every day.

I have put together a simple example of how this can be achieved in Python by using nested loops.

import json
import requests

##### Get customers from the SQL database

##### MOCK CUSTOMER LIST AS EXAMPLE
customers = [3051, 3051, 3052]

for customer in customers:
    print('Loading data for customer', str(customer))
    url = 'https://api.example.com/v3/orders/' + str(customer) + '/'
    print ('Calling URL', url)
    response = requests.get(url)
    orders = response.json()

    ##### MOCK API FOR EXAMPLE
    orders = '''
                [
                    {
                        "order": "66250",
                        "customer": "3052"
                    },
                    {
                        "order": "66690",
                        "customer": "3052"
                    }
                ]
    '''

    Jorders = json.loads(orders)

    for order in Jorders:
        print('Loading order details for order', str(order['order']))
        url = 'https://api.example.com/v3/ordersDetails/' + str(customer) + '/' + str(order['order']) + '/'
        print ('Calling URL', url)
        response = requests.get(url)
        orderDetails = response.json()

        ##### MOCK API FOR EXAMPLE
        orderDetails = '''
                            {
                                "order": "66250",
                                "salesDate": "2020-02-03 14:15:44",
                                "totalItems": 1,
                                "Status": "Pending"
                            }
        '''

        JorderDetails = json.loads(orderDetails)

        order = JorderDetails['order']
        print(order)
        totalItems = JorderDetails['totalItems']
        print(totalItems)
        salesDate = JorderDetails['salesDate']
        print(salesDate)
        status = JorderDetails['Status']
        print(status)

        ## INSERT VALUES INTO SQL HERE
  

Is there a more efficient way of doing this using (AWS) DataBricks?

Source: Python-3x Questions

LEAVE A COMMENT