API pagination – pulling the same four pages over and over

  python

I’m new to an API that uses scroll_id to paginate, I’m trying to pull all the data from the endpoint but it seems as though I’m pulling the same 4 pages over and over again.

It doesn’t make sense to me that I’m getting four unique results, but pagination isn’t actually occurring. How am I paginating but not getting all of the data?

count = 1
columns = []
session_url = 'https://export-na.clarabridge.net/api/v2/export/sentences?'
payload = {'auth': '123abcdefg',   # API key
        'since': '2019-01-01',
        'fields': 'all',
        'attributes': 'all'
        }

r = requests.get(url=session_url, params=payload, auth=BearerAuth(payload['auth']))

print("Total Documents: ",_total_count)
print("*************************************************************************************")

try:
    print("Pulling records: ", count*1000)
    json_data = r.json()
    data = json_data['sentences']
    _scroll_id = json_data.get('scroll_id')
    df = pd.DataFrame(pd.json_normalize(data))
    df.to_csv(os.path.join(SAVEPATH,'Crunchbase_raw_%s.csv' % (count)),index=False)
    print("Data Shape: ", df.shape)
    columns.append(df.shape[1])
    count = count +1
except KeyError:
    data = []
    _scroll_id = None
    print ('Error: Elastic Search: %s' % str(r.json()))
while data:
    print("Pulling records: ", count*1000)
    # scroll to get next batch data
    scroll_payload = {
        'auth': '123abcdefg',   # API key
        'since': '2019-01-01',
        'fields': 'all',
        'attributes': 'all',
        'scroll_id': _scroll_id
    }
    scroll_res = requests.get(url=session_url,params=scroll_payload, auth=BearerAuth(scroll_payload['auth']))
    try:
        json_data = scroll_res.json()
        data = json_data['sentences']
        _scroll_id = json_data.get('scroll_id')
        df = pd.DataFrame(pd.json_normalize(data))
        df.to_csv(os.path.join(SAVEPATH,'Crunchbase_raw_%s.csv' % (count)),index=False)
        print("Data Shape: ", df.shape)
        columns.append(df.shape[1])
        count = count +1
    except KeyError:
        data = []
        count = count +1
        _scroll_id = None
        err_msg = 'Error: Elastic Search Scroll: %s'
        print (err_msg % str(scroll_res.json()))

Source: Python Questions

One Reply to “API pagination – pulling the same four pages over and over”

  • Not sure if you’re still looking for an answer to this question, but we had a similar issue that centered around the scroll_id. It turned out that the scroll_id wasn’t being passed to the API as a string variable, which was causing the API to regurgitate the same four pages of results over and over again. Thus, it might be worth checking the type of that variable. Also, I’m not sure if the ‘scroll_id’ parameter hasn’t been renamed simply to ‘scroll’…. – might be worth checking that too.

LEAVE A COMMENT