Memoize computational intense result in git repository without potential merge conflicts

Suppose I have the following dependency

foo.png -> foo.blend
foo.o   -> foo.png
foo.o   -> foo.cpp

where foo.png is going to be embedded inside foo.o with some GNU assembly hack https://stackoverflow.com/a/36295692/877329.

Now I want to compile foo.cpp, and foo.png is not up to date. Then, if the machine does not have any compute accelerators, or the scene is an indoor scene with some glossy materials, this will take a very long time.

One solution is to add the target foo.png to git so it gets pushed to the server but this may result in some false merge conflicts:

  1. foo.png has been recompiled locally without any change in foo.blend. While the image will look the same, some metadata may be different (Blender saves timestamp in the tEXt chunk).
  2. Then git pull is used to fetch the latest content
  3. Now

    foo.png will be overwritten by merge

The correct thing here, is to let the change state of foo.png follow foo.blend. Is it possible to establish this kind of master-slave relationship between two files, or can this problem be solved in a different way?

1 answer

  • answered 2017-06-17 20:01 torek

    As Oliver Charlesworth commented, your best bet is perhaps to sidestep the problem entirely. For instance, instead of the build rule: Make foo.png by running blender on foo.blend, use a build rule of: Make foo.png by obtaining the correct version from an artifact-server (ipfs?), or if there is no such version, run blender.

    If you don't want to do that, though, read on. (I think the end result is that you will want to do that. :-) )

    There is a theoretical way to handle the full general case, by writing your own merge strategy. The merge strategy would take a list of such "slaved" files and know to copy from the "more-mastery" side of a merge when there's a change on just one side, and fail when there is a change on both sides. I'd suggest this is probably too hard, though, and there is a "dirtier" solution—for which the machinery is already there—using merge drivers.

    You can do anything you want in a merge driver, since you write it yourself. In this particular case, it might be a good idea to treat this as "almost but not quite binary", i.e., as binary except for the changeable metadata): if the two inputs match except for metadata, merge the metadata and write a new PNG file and exit successfully.

    You could even treat this more as a "slave result to the chosen or merged foo.blend", but there is a potential problem here (which is one part of why a fully general solution requires writing a merge strategy): you are not guaranteed that foo.blend has been merged yet, when your merge driver gets run.

    You should also note that your own driver is not run if Git does not think there are multiple versions to merge. Suppose, for instance, that you are merging commits L and R with merge-base commit B. Git has run:

    git diff --name-status B L
    git diff --name-status B R
    

    to figure out whether there are changes on one or both sides of the merge. Suppose further that in the B-to-L leg, foo.png changes but foo.blend does not, while in the B-to-R leg, foo.blend changes but foo.png does not. In this case, Git thinks all is well and takes the L version of foo.png and the R version of foo.blend.

    (For your particular example, this case represents bad input: if foo.blend changed, foo.png should have changed as well—if nothing else, the metadata should probably have changed. So this merely propagates bad input to bad output.)

    Finally, note that only the artifact-server method solves the actual problem, because all of this combining at the Git level happens in a way that is unaware of the timestamp-based rules that make or similar build systems tend to use. Git may merge (changes in both L and R) or copy (changes in just one side) foo.blend at time-stamp T, but merge or copy foo.png at time-stamp T±k for some nonzero k. Some of these produce unwanted results (i.e., force a new run of blender). You will usually get away with it as most of these things tend to happen within the 1-second granularity of typical file time stamps, so that k=0, and when k>0 you have a chance of winning anyway. (I believe, but have not tested, that Git runs the actual per-file merges in sorted order as a side effect of pairing files for rename detection; blend sorts before png, which means the time stamps always work in your favor. But it's not necessarily wise to depend on it.)