I am having issue while inserting nested data into bigquery using aws glue job using google bigquery connector Below is my bigquery table scheme competition FLOAT NULLABLE categories RECORD REPEATED id INTEGER REQUIRED And in custom transform of aws glue I am trying to send list of python dict to categories like below: [{"id":10004},{"id":10009},{"id":10301}] Please ..

Read more

i have a big file (2.5 GB, 50 million of lines) and from this file i need to extract different type of data. Lines are composed like these: ‘timestamp,id;i1,quantity,***,***;i2,quantity2,***,***n’, ‘o1,quantity1′ Every line could have different length (timestamp and id are always one, but i don’t know how many i or o are in every line) ..

Read more

I want to learn webscraping in python, but I don’t really know how or where to start. My code runs, but it only returns an empty string import requests import urllib from urllib.request import urlopen from bs4 import BeautifulSoup #import pandas as pd html = urllib.request.urlopen("https://www.nba.com/games") soup= BeautifulSoup(html, "lxml") games= soup.find_all("li", class_= "w-full flex flex-col ..

Read more

I am trying to create a spark dataframe from an excel file in jupyter notebook using the following code. pandasDF = pandas.read_excel(‘DownloadsData.xlsx’, sheet_name=’Sheet1′) sparkDF=spark.createDataFrame(pandasDF) sparkDF This works, and can dispay the dataframe. However if I run: sparkDF.describe().show() I get the following error: Py4JJavaError: An error occurred while calling o226.describe. : org.apache.spark.SparkException: Job aborted due to ..

Read more

Manipulating signals Generate signals defined over [0,2𝑝𝑖] . Ramp Define the function ramp as a linear function of x. a is the rate and b is the offset at the origin def ramp(x, a, b): return ramp (a*x + b) import matplotlib.pyplot as plt plt.plot(x, ramp(x,1,0)) TypeError Traceback (most recent call last) ~AppDataLocalTemp/ipykernel_13992/675862003.py in <module> ..

Read more