Fossil conversion

Three weeks ago I wrote about the fossil tests. Quite a few things have changed inside fossil and I have been working on reimplementing the Python parts of conversion tools in C as well improving the performance.

The code doesn't have any fancy build system and a few (Net)BSD features are used, so don't expect it to work out of the box on anything else. Most RCS files should work out-of-the-box. A few limitations are known and not handled automatically:

  1. Commit time inversions, e.g. time going backwards from one commit to the other. The only instance where this is not a bug in the repository is for 1.1.1.1, which can legitimately be older than 1.1. This results in the revisions being picked up in the wrong order.
  2. Vendor branches are ignored after trunk conversion has been done. Normally 1.1 is dropped in favor of 1.1.1.1 and the various vendor versions are pulled into trunk as long as they are older than 1.2. Support to do a bare import of a given vendor branch will be added at a later point, e.g. to convert them into a standalone branch.
  3. Tags are completely ignored.
  4. Branches are considered as vendor branches, if they are using 1.1.1 as revision for any file in the tree.
  5. No automatic fixup of missing vendor branches or default branches. The output of 99-warnings can be used with cvs rtag or rcs to fix those up.
  6. Branches must have a consistent parent relation, currently no diagnostic is given if a branch is dangling due to depending on a vendor branch.
  7. Branch time computation is somewhat simplistic. No diagnostic yet for files added incorrectly after the initial branching or if commits happened between the global branch time and the branch point of a specific file. No support for replaying of commits to fix such cases either.

I've been slowly fixing up various issues exposed by this tool in the NetBSD repository. Processing needs around 5h on an AMD Opteron 1389 from src in CVS form to the Fossil repository. The majority of the time is spend in the commit building step, which is primarily IO bound and a potential place for further investigation. Ultimate goal is to get a bit exact import of all major branches, which seems to (almost) the case now.