pyspark – creating a column as quarter-year from a date column

  apache-spark, pyspark, python, python-3.x

I have a dataframe like this, and format of column date is yyyy-mm-dd :

+--------+----------+---------+----------+-----------+--------------------+
|order_id|product_id|seller_id|      date|pieces_sold|       bill_raw_text|
+--------+----------+---------+----------+-----------+--------------------+
|     668|    886059|     3205|2015-01-14|         91|pbdbzvpqzqvtzxone...|
|    6608|    541277|     1917|2012-09-02|         44|cjucgejlqnmfpfcmg...|
|   12962|    613131|     2407|2016-08-26|         90|cgqhggsjmrgkrfevc...|
|   14223|    774215|     1196|2010-03-04|         46|btujmkfntccaewurg...|
|   15131|    769255|     1546|2018-11-28|         13|mrfsamfuhpgyfjgki...|
+--------+----------+---------+----------+-----------+--------------------+

I would like to create and append a column that has a letter Q, quarter, an underscore, year from date column to this dataframe. Here’s what I have tried:

from pyspark.sql import functions as F
sales_table.select(
    ("Q"+F.quarter('date')+"_"+F.date_format("date", "y")).alias('quarter_year')
).show(5)

and this is what I got:

+------------+
|quarter_year|
+------------+
|        null|
|        null|
|        null|
|        null|
|        null|
+------------+

My expected output:

+---------------+
|   quarter_year|
+---------------+
|        Q1_2015|
|        Q3_2012|
|        Q3_2016|
|        Q1_2010|
|        Q4_2018|
+---------------+

I am surprised by the absence of error messages and column with nulls. How do I create this?

Source: Python-3x Questions

LEAVE A COMMENT