Posts

Showing posts from September, 2020

PySpark - hash with MD5 and convert to upper case by converting to hex

This blog is a quick code snippet, which I thought of sharing!  So the below method/function is to ensure to convert a device id to MD5 version and before doing so, convert the data column to upper case and convert the value to hex and again upper case. Google's Ads Data Hub expects you to convert the data to the below format: UPPER(TO_HEX(MD5(UPPER(device_id)))) as device_id_md5 The below code helps us to achieve that in PySpark: from pyspark.sql.functions import md5, col, upper, hex from pyspark.context import SparkContext from pyspark.sql.session import SparkSession sc = SparkContext('local') spark = SparkSession(sc) df = spark.createDataFrame( [ ["63c94c81-44e4-4e4d-bb26-b95648581a16"], ["c5628aa9-92de-4e7d-ac60-e35851e93f22"], ["eeb472cb-aa3a-44f1-818c-bcc069d57367"] ], ("col1") ) df.withColumn("col1", upper(hex(md5(upper(col('col1')))))).show() NOTE: The above device ids are dummy ones.