Category : apache-spark

So I have a given PySpark DataFrame, say df, looking like below: df.show() +——————–+——————-+ | series| value | +——————–+——————-+ | XXXX-AAAA | 1 | | XXXX-BB | 2 | | XXXX-CCCCC | 3 | +——————–+——————-+ In the series column, I would like to get rid of the XXXX- substring (i.e. length of 5 characters), which ..

Read more

I have a dataset which contains Multiple columns and rows. Currently, it’s in String type And, I wanted to convert to a date-time format for further task. I tried this below code which returns null df = df.withColumn(‘Date_Time’,df[‘Date_Time’].cast(TimestampType())) df.show() I tried some of the solutions from here, but none of them is working all, in ..

Read more

Hi I am new to spark and I am trying to write my first program when I import from pyspark.mllib.linalg import Vectors i get the following I am using anaconda and Jyupter note book I have installed spark and I am able to execute from terminal ————————————————————————– TypeError Traceback (most recent call last) <ipython-input-1-da13502db94b> in ..

Read more