What is the standard way (design pattern) to store lists of the filenames in compressed zip files (inventory)?

I have millions of json files in aws s3 corresponding to individual objects in our application that have the filename {objectid}.json. They are frequently accessed individually, and also will all be imported for future analysis with presto, athena, redshift, hdfs, etc...

In order to load them into one table in hive efficiently, I have created a script to combine thousands of json files into one file and then compress them (gzip). For example, I concatenate the first thousand files with objectid beginning with 1* into a single file, and so on (similar to what Apache DistCp does). This is useful because HDFS can read the contents of these compressed files automatically as if they were uncompressed (without me doing anything).

In other situations, when I wish to retrieve a specific file, I uncompress the corresponding zip file, and pull out the specific file. But the main problem is that I need to know which zip files contain which object files.

What is the best way to keep track of which files are in which compressed .gz file? Store in (mysql) database? If so, what should be the schema? What is the standard way to store lists of the filenames in compressed zip files ? Is there some storage pattern that HDFS and other big data software would understand automatically?