PySpark - hash with MD5 and convert to upper case by converting to hex

This blog is a quick code snippet, which I thought of sharing! 

So the below method/function is to ensure to convert a device id to MD5 version and before doing so, convert the data column to upper case and convert the value to hex and again upper case.

Google's Ads Data Hub expects you to convert the data to the below format:

UPPER(TO_HEX(MD5(UPPER(device_id)))) asdevice_id_md5

The below code helps us to achieve that in PySpark:

from pyspark.sql.functions import md5, col, upper, hex
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSessionsc = SparkContext('local')
spark = SparkSession(sc)df = spark.createDataFrame(
[
["63c94c81-44e4-4e4d-bb26-b95648581a16"],
["c5628aa9-92de-4e7d-ac60-e35851e93f22"],
["eeb472cb-aa3a-44f1-818c-bcc069d57367"]
],
("col1")
)
df.withColumn("col1", upper(hex(md5(upper(col('col1')))))).show()


NOTE: The above device ids are dummy ones.

Comments

Popular posts from this blog

SSH using Chrome Secure Shell app with SSH identity (private key and public key)

Load Testing using Apache Bench - Post JSON API

NGinx + Gunicorn + Flask-SocketIO based deployment.