Git merge error - different config files in different branches - using tilde - overwritten master branch history - where did I go wrong?
I recently had a nasty merging problem and was wondering if anyone could help me understand it, so it doesn't happen again.
In the beginning
In my Git repo,
master branch is the production environment and I want to use
dev branch as the staging environment.
My plan was to maintain a config file in different states on
masterthe config contains connection settings for live third-party APIs etc.
devthe config contains settings for sandboxed versions of all the APIs etc.
dev by branching from
master (1), committing changes to the config (2) and then merging into
master using 'ours' strategy (4), so that the config would remain in different states on each branch but they would be at the same point in their histories (so future changes on
dev can be merged to
master without altering the config):
1 git checkout -b dev 2 git commit -am "Dev config settings" 3 git checkout master 4 git merge -s ours dev
The problem started
I was OK until I got to the point where I needed to edit the config file again. I'd made several changes on
dev, something like this:
commit_1 commit_2 commit_3 commit_to_config_file commit_5 commit_6
...and searched around to find it's possible to merge different parts of the history using tilde followed by a number.
However, I realise now I'd misunderstood this info. I thought the tilde numbering worked like this:
commit_1 # dev~6 commit_2 # dev~5 commit_3 # dev~4 commit_to_config_file # dev~3 commit_5 # dev~2 commit_6 # dev~1 = most recent commit
So I did this on
git merge dev~4 git merge -s ours dev~3 git merge dev
...and hit a merge conflict.
I didn't realise my numbering mistake at this point, so resolved the conflict in my mergetool. Afterwards I found there were errors on
master and thought the easiest way to deal with them would be to commit corrections directly. Everything now worked fine on production.
Sealing my doom
Having made a commit on
master, I now needed to get the histories back in sync between the two branches. I thought I should skip the commit I'd made on
master by merging on
dev using the ours strategy again:
git checkout dev git merge master -s ours
Days later I committed more changes on the
dev branch and merged into
git checkout master git merge dev
Shortly afterwards I found the production environment was (disastrously) using the sandbox config settings! I had overwritten the
master branch history with the
dev history. It now said I'd committed the sandbox settings to the config file about 14 days ago on
master - which I'd missed because I didn't check that far back in the history before pushing my changes to origin.
Questions at last
1) Why did merging
dev back into
master cause the history to be overwritten?
2) What should I have done instead to preserve the different histories?
3) I now think the tilde numbering on my above example should actually have looked like this:
commit_1 # dev~5 commit_2 # dev~4 commit_3 # dev~3 commit_to_config_file # dev~2 commit_5 # dev~1 commit_6 # dev = most recent commit
... is that correct?
4) Am I using a good method for maintaining different config files in both branches? (I've just seen this method using .gitattributes which looks much better).
the config file was mainly overwrited for
masterbranch when the first time to merge
masterbranch by fast-forward merge (recursive merge strategy).
For the work flow, when you created
masterbranch and changed config file on
devbranch, assume the commit history as blow:
...---A---B master \ C dev
master, the commit history will be:
So the config file on
masterbranch is automatically change to the same version with
devbranch (our merge strategy actually not work).
BTW: since you need to keep the config file differently for each branch, you can also ignore the config file for full branches.
First, let's cover this part:
Am I using a good method for maintaining different config files in both branches? (I've just seen this method using .gitattributes which looks much better).
I would not recommend either of these methods, at least in general.
Instead, if you have a configuration file that controls things, don't check it in at all (at least not in this repository—you might check it in to some other repository, and make the file you use here a symbolic link to the "real" configuration file stored in the other repository, for instance). Have as a source-controlled file an example or sample or starter configuration. Have your system copy this file to the real configuration file (that is ignored via
.gitignore) if necessary.
In some cases, you might split configurations into "system configurations" (which might be tracked) and "user configurations" (which generally would not be tracked and might be in a different directory entirely). Compare this with, for instance,
.gitattributes, where you set things like how the files under source control should be treated, vs
$HOME/.gitconfig, where you set things like how commits should record your name and email address. The former really is a property of the source, and the latter is not.
The drawback to using a merge driver in
.gitattributesis that such merge drivers are only run in the case of a "true merge", which ... well, see the long section below.
The long part
... it's possible to merge different parts of the history using tilde followed by a number.
This is true (at least in various senses) but may be misleading. In fact, you can run
git mergewith any commit hash ID. When you run
git merge branchX, Git will first turn the name
branchXinto a commit hash ID. That commit hash ID is the one that the name
o <-- branchW / ...--o--o--o--o <-- branchX \ o--o <-- branchY
Here each of the round
onodes represents a commit—an object with a big ugly hash ID—and the branch names simply act as moveable pointers to the commit. To grow a branch like
branchW, we run
git checkout branchW, which "attaches our HEAD" to the branch:
o <-- branchW (HEAD) / ...--o--o--o--o <-- branchX \ o--o <-- branchY
and fills in Git's index with the tip commit contents, and likewise fills in our work-tree where we do our work. (The index is where you build the next commit, so it starts out matching the work-tree and the current commit.) We then modify files in the work-tree, where we do our work; then we copy the changed files back into the index, so that the next commit will snapshot the updated versions, rather than re-snapshotting the old versions; and then we run
git commitcommand makes a new commit
*whose parent is the current tip of the branch:
o--* / ...--o--o--o--o <-- branchX \ o--o <-- branchY
and then writes that new commit's hash ID into the branch name to which
HEADis attached, so that now
*instead of its parent:
o--* <-- branchW (HEAD) / ...--o--o--o--o <-- branchX \ o--o <-- branchY
If you run
git mergeand give it a branch name,
git mergelocates the tip commit to which the branch name points. If you run
git mergeand give it anything that identifies some other commit,
git mergelocates the other commit.
What it does after locating the other commit gets a bit complicated. Let's dive into this part instead for now:
Why did merging master into dev and dev back into master cause the history to be overwritten?
It didn't! You are thinking of history and contents as if they are the same thing, but they are quite different.
In Git, the commits are the history. The graph I drew above shows eight points of history—eight commits—once we've added a new commit that's become the new tip of
...section represents more history, of course, but that's not history we care about at the moment.)
Each commit stores a (single) snapshot, which is the source as of that point in history. As I mentioned above, the content that goes into this snapshot is whatever is in the index, which Git also calls the staging area and sometimes the cache. It has multiple roles, but the main one is that it's the source for all the files that go into each new commit you make, as you make new commits.
Every time you add a commit, you add more history. Each commit has a backwards link, connecting the commit to its parent. Merge commits—here we use the word
mergeas an adjective, or sometimes as a noun: a merge means a merge commit—has two (or more, but don't worry about this here) of these backwards links. The first one tells you about the normal first parent, as always; the second one tells you—and Git—which commits were brought in by the act of merging and hence no longer need to be considered. This last bit is going to be the key to the problem.
It's important to remember here that every commit is a pure snapshot. There is no notion, at this level, of a commit as a change. It's just a snapshot at this level! But most commits have one parent, and if you compare the snapshot in the parent to the snapshot in the child, you get a change.
If you compare two commits, you will see what happened to existing files, whether any files were deleted entirely, and whether any files were created. In other words, Git can, by using history, turn a snapshot into a change-set. But you don't have to compare parent to child. You can, instead, compare some great-great-great-grand-parent to the child, to get a longer-term view. This is where
git mergecomes in. I suspect you actually have a fairly good handle on merge-as-a-verb, but we'll come back to that in a bit.
Naming specific commits
As you eventually suspected, the
~notation counts backwards from zero, not from one. Let's draw a slightly different graph, and give the commits single letter names so that we can talk about them:
...--B--C--D <-- master \ E--F--G--H--I--J <-- dev
I. The name—or in fact, almost anything that identifies a commit—can have a
^character appended, followed by a number. This is all documented in the gitrevisions manual page, but in short, tilde followed by a number means "count back that many first-parent links". For non-merge commits, there's only one parent, so it's obvious which link is the first parent too. Hence
dev~0counts back no steps and names commit
dev~1counts back one step and names commit
I; and so on.
Note that if we count back six steps, we name commit
B, which is also on
master. This is a strange feature of Git: commits can be on more than one branch at a time. (Many if not most other version control systems don't behave this way: a commit, once made, is on the branch you made it on, no more and no less.) For this reason, it's sometimes better to think of commits as being "contained within" branches: commit
Bis contained within both
Let's look now at what
git mergedoes—but watch out, it's a bit complicated. There are an unfortunately large number of cases here, but let's look at the merge-as-a-verb, to merge, case that results in a merge. We already mentioned the adjective / noun form of merge. At the end of merging,
git mergeoften makes a merge commit, and this commit, by definition, has two parents: the first parent is the commit that was current (was
HEAD) when you ran
git merge, and the second one is the one you named as an argument to
git merge. But how does this commit come about? Here, we experience the verb form, to merge:
git checkout master; git merge dev
One of the commits is the current commit
master). The other is the commit
When Git goes to merge these two commits, it must identify a third commit, which we call the merge base. This is where a history like that of
devcomes in, because the merge base is, loosely speaking, the commit where the branches "come together". There may be more than one such commit; in that case, Git takes the one "nearest" the end points. So if we have:
...--B--C--D <-- master (HEAD) \ E--F--G--H--I--J <-- dev
B, and everything before
B, is a possible candidate, but since
Bis "closest" to the two branch tip commits
I, Git will always pick
Git now has the necessary three inputs, and can merge
Bas the merge base. For the normal case, Git will now run two
git diffcomparisons, much as if you ran:
git diff --find-renames B D > /tmp/b-vs-d.patch # what we did git diff --find-renames B J > /tmp/b-vs-j.patch # what they did
Git then combines the two sets of changes. Git declares a conflict if any change we made on "our side" (b-vs-d) touches the same line(s) of the same file(s) as any change they made on their side (b-vs-j), except for the case where we made the exact same change that they made. If we both made the same change, Git just takes one copy of that change.
If all goes well—usually it does—there are no conflicts and Git builds up a work-tree and index that consists of "everything from B, changed according to everything we changed, and changed according to everything they changed". Git is now able to make the new merge commit, so let's draw that in:
...--B--C--D------------K <-- master (HEAD) \ / E--F--G--H--I--J <-- dev
Khas two parents,
J, and that's a normal merge.
But you didn't do that; let's un-draw it, and go back to the original. Instead, you ran:
git merge dev~4
so let's draw the effect of that, knowing that
dev~4moves back across four first-parent links, from
...--B--C--D--K <-- master (HEAD) \ / E----F--G--H--I--J <-- dev
Git combined the
Dchanges on master with the
dev, and made new merge commit
Then you ran:
git merge -s ours dev~3
-s oursmodifies the verb form of to merge, without changing the noun form. The verb-form change includes the fact that Git doesn't bother computing the merge base at all (it doesn't have to). If Git did compute the merge base, though, what would it be? To find out, start at
Kand work backwards along all possible paths, and start at
dev~3) and work backwards too.
Kwe move back to both
Gwe move back to
Fis on both branches and is closest to both branch tips, so
Fis the merge base. The
-s oursmerge then ignores
Fentirely, takes the tree (the snapshot) from
K, and makes a new merge commit with two parents as usual:
...--B--C--D--K--L <-- master (HEAD) \ / / E----F--G--H--I--J <-- dev
Now you ran:
git merge dev
Once again, we start by finding the merge base (of
J). Work backwards as needed: the parents of
G, and the parent of
Iwhose parent is
Hwhose parent is
This means the merge base is
G, and Git will now compute two diffs:
git diff --find-renames G L > /tmp/g-vs-l.patch git diff --find-renames G J > /tmp/g-vs-j.patch
Git now starts with the tree/snapshot from
G, applies our changes from
g-vs-l, applies their changes from
g-vs-j, and hits a merge conflict. At this point Git just stops with the partial merge recorded (along with the conflicts) in the index and work-tree.
[I] resolved the conflict in my mergetool.
This finishes the merge-as-a-verb process, resolving the index and work-tree conflicted files and adding all the final versions to the index. You can now run
git merge --continueor
git committo finish the merge to get:
...--B--C--D--K--L--------M <-- master (HEAD) \ / / / E----F--G--H--I--J <-- dev
The tree (snapshot) for
M, the last merge on
master, is the one you constructed when you resolved the conflicts.
Let's see what happens now
Having made a commit on master, I now needed to get the histories back in sync between the two branches. I thought I should skip the commit I'd made on master by merging on dev using the ours strategy again:
git checkout dev git merge master -s ours
Let's see what this does. The
git checkout devstep replaces the index and work-tree contents with the tip commit of
dev, and attaches
...--B--C--D--K--L--------M <-- master \ / / / E----F--G--H--I--J <-- dev (HEAD)
The only thing we see in the diagram is the movement of
HEAD, but the index and work-tree are of course changed as well.
git mergestep is now a bit special. As before, you are using
-s oursto invoke the "ours" strategy. This completely ignores the merge base; it just makes a new commit, re-using the current index contents, with two parents, so let's draw that:
...--B--C--D--K--L--------M <-- master \ / / / \ E----F--G--H--I--J---N <-- dev (HEAD)
The tree for
Nmatches the tree for
J; the first parent of
J, and the second parent of
Days later I committed more changes on the dev branch ...
OK, let's draw some more
...--B--C--D--K--L--------M <-- master \ / / / \ E----F--G--H--I--J---N--O--P <-- dev (HEAD)
and merged into master:
git checkout master git merge dev
The first step moves
master, and changes out our index and work-tree contents to match
The second step finds the merge base of the current commit
Mand the named commit
P. This is the first commit we can find that is on both branches. This time, start at
Pand work backwards. Its parent is
O's parent is
Nhas two parents, **including
At this point, since we are doing a normal (not
-s ours) merge, Git does something peculiar: it doesn't bother with either kind of merge at all. It skips the merge-as-a-verb steps, and the merge-as-a-noun steps. Instead, it just immediately makes the index and work-tree match the other commit and makes the current branch name point to the other commit:
...--B--C--D--K--L--------M \ / / / \ E----F--G--H--I--J---N--O--P <-- master (HEAD), dev
Both branch names now point to the same commit! Commit
Pis now the tip commit of both branches; both branches contain all the same commits.
There is a way we could have forced Git to make a merge commit anyway, using
git merge --no-ff. Let's see what happens if we had used that instead. Git would find the merge base
M(no changes), compare
P(their changes), and build a new commit using their changes:
...--B--C--D--K--L--------M---------Q <-- master (HEAD) \ / / / \ / E----F--G--H--I--J---N--O--P <-- dev
The tree for
Qwould match the tree for
P. In other words, because the merge base is
Mitself, we have set ourselves up so that any merge of
devamounts to taking the final commit of
devas our snapshot, whether we do that directly (as a fast-forward non-merge setting
masterto point to commit
P) or by forcing a real merge commit
Qthat just re-uses the tree from
We now have fairly definite answers to some of the questions, and guidelines for the others:
1) Why did merging master into dev and dev back into master cause the history to be overwritten?
It didn't: it just added new history. The problem is that the new history set you up with a hazard. This is always a potential problem; all merges need some kind of testing and/or inspection because Git is just following a bunch of simple textual rules for combining change-sets.
2) What should I have done instead to preserve the different histories?
Nothing, really, except perhaps use the
--no-ffflag to force the merge operation to make a merge commit. Fundamentally, the problem is that you're treating the configuration as if it's a file that corresponds to each snapshot and needs to be merged like any other file in any snapshot. Sometimes that's true, and sometimes it is not. As long as the file is in the tree like this, you must inspect the result of each merge.
3) I now think the tilde numbering on my above example should (have been different by one)
4) Am I using a good method for maintaining different config files in both branches?
This part is difficult to say. But, if you use a
merge=oursmerge driver, note that it will fire only when the hash ID of the file in all three input commits differs. That is, the contents in the merge base must differ from the contents in the
HEADcommit, and both must differ from the contents in the other commit. Of course, if the merge base is the same commit as either input commit, all the files in the merge base necessarily match all the files in at least one of the tip commits. If the two tips have the same file contents, you're OK anyway. So this really goes wrong when:
- the merge base isn't the same as the
- the file in the merge base is the same as the file in the
HEADcommit, in which case
- Git can (and does) just take the file from the other commit without running your merge driver. If that's different from the file in the
HEADcommit, you have picked up their changes!
(To see how this works, try running:
git rev-parse HEAD:Makefile
if you have a file named
Makefilein your commit, for instance. Every file in every commit has a hash ID. The hash ID depends on the file's contents; if the file is the same in two different commits, those two different commits use the same blob hash in their trees, so that they share the single copy of the underlying file.)
- the merge base isn't the same as the