reconstructing a tree of strings in pyspark

I have a very large (>100 GB) csv file I need to parse that contains information about file system paths in pyspark (v2.1.1). Each line of this file has a unique ID, unique ID of parent, and a text string. I'd like to traverse this file, find the parents for each entry and concatenate the strings together in order. For instance:

Unique ID,Parent ID,String
would reconstruct to /usr/foo/bar. The file has one master parent and the IDs are not sequential (and are complex strings like 0x200005a48:0x1:0x0).

My pyspark code can traverse the file, but unfortunately isn't running in parallel and is very slow. Suggestions for how to do this better would be greatly appreciated:

def find_parent(pid,file_path,origfid):
   global biggie
   newdf = biggie[biggie["fid"] == pid][["pid","name","fid"]]
   newname ='name').rdd.flatMap(list).first()
   fid ='fid').rdd.flatMap(list).first()
   file_path = newname+"/"+file_path
   final_parent = "0x200005a48:0x1:0x0"
   if fid == final_parent: 
      return origfid,file_path
   pid ='pid').rdd.flatMap(list).first()
   return find_parent(pid,file_path,origfid)

spark = SparkSession.builder.appName("ScratchFiles").getOrCreate()
biggie =,schema=dataschema).persist()
children = biggie.filter(biggie["fid"] != final_parent).persist()
count = children.count()
while count > 0:
   first = children.first()
   fid = first["fid"]
   file_path = first["name"]
   pid = first["pid"]
   origfid,file_path = find_parent(pid,file_path,fid)
   children = children.filter(children["fid"] != fid)
   count = children.count()