Software archeology: Version control with Fossil
Jörg Sonnenberger
Who am I?
- Software engineer from Rostock/Germany
- NetBSD developer
- DragonFly BSD alumni
- Contributor for a wide range of Open Source projects
What is Fossil?
- A distributed version control system
- A bug tracker
- A wiki engine
- A simple blog
- ...and all in one tight bundle
Getting started
- Fetch a pre-built binary for Windows, Mac OS X or Linux
- Compile from source
- Requires C compiler and zlib
- Optional: OpenSSL for HTTPS support
- ./configure && make && make install
- A few seconds later...
Getting started (II)
- fossil help
- fossil ui <repo>
The big picture
- Everything stored in SQLite database
- Using zlib compression
- Using deltas where possible
- Specially formatted artifacts for commits, tickets, wiki changes
- SQL for heavy lifting of filtering and sorting
Interaction with the outside world
- Import from git via fast-import format
- Export to git via fast-import format
- CVS import with cvs2fossil
CVS conversion in 2010
- No fast tool with proper RCS keyword expansion
- Too much guessing for branch creation
- No support for assisted repository clean up
- Only fromcvs has support for incremental conversion
- ...but it has other issues
Design goals for new converter
- Faithful conversion: expand RCS keywords like CVS does
- Smart conversion: magic CVS revisions and handling of vendor branches
- Fast conversion: conversion cycle of less than a day on modern hardware
- Helpful conversion: catch consistency issues and help with cleaning them up
Implementation overview
- Large data set, for NetBSD src: 190k files, 1m revisions, 240k changesets, 24GB raw undeltafied content
- Use of SQLite to provide heavy lifting and indexing: SQL as high level language
- Consecutive phases to allow manual intervention
- ...like fixing up branch points or unwanted branches
Implementation overview: Phases
- Import CVS into SQLite, compute branch points, map revisions onto branches
- Adjust vendor branches, validate consistency of branch points
- Compute creation time of branches
- Compute manifests, copy references file content into target repository
- "fossil rebuild" or "fossil pull" to finish conversion
Scalability testing with NetBSD
- 2h for the CVS conversion for NetBSD src
- 2GB working copy size
- Most operations are within 100% and 200% of the Git runtime of the same tree
- One big exception is initial cloning
- Published automatic updated repositories
- Highlight scalibility issues in Github too
Why should I consider Fossil?
- All required tools for the typical Open Source project in one piece of software
- Good enough in all areas for many users
- Database schema simplifies generation of reports
- Sane code structure allows playing with VCS algorithms
When should I not consider Fossil?
- If you can't stand hashes as identifiers
- If you have a working integrated setup of all the components and you are happy with it
- If you require Copyleft for all software
- If your repository is large enough that to run into long clone/rebuild times
- If you need external actions to be triggered on push/pull (in discussion)