PySpark - hash with MD5 and convert to upper case by converting to hex
This blog is a quick code snippet, which I thought of sharing!
So the below method/function is to ensure to convert a device id to MD5 version and before doing so, convert the data column to upper case and convert the value to hex and again upper case.
Google's Ads Data Hub expects you to convert the data to the below format:
UPPER(TO_HEX(MD5(UPPER(device_id))))
as
device_id_md5
The below code helps us to achieve that in PySpark:
from pyspark.sql.functions import md5, col, upper, hex
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSessionsc = SparkContext('local')
spark = SparkSession(sc)df = spark.createDataFrame(
[
["63c94c81-44e4-4e4d-bb26-b95648581a16"],
["c5628aa9-92de-4e7d-ac60-e35851e93f22"],
["eeb472cb-aa3a-44f1-818c-bcc069d57367"]
],
("col1")
)
df.withColumn("col1", upper(hex(md5(upper(col('col1')))))).show()
NOTE: The above device ids are dummy ones.
Comments
Post a Comment