This tutorial is for members of Alan Rogers's lab at the University of Utah.
Subversion is a program for keeping track of software, text, and so on. Don't use it for large data files, such as vcf or bcf files. Do use it for README files, scripts, software, and small files of all types. The subversion server should contain everything you need to recreate a project, should the large data files get destroyed.
(We back up large files on the archive storage at the Center for
High-Performance Computing (CHPC) of the University of Utah. See the
tutorial on rclone
for details.)
Like other version control systems, you can use subversion to reach back into the past and recover a file as it existed long ago. For example,
svn cat -r {2008-03-15} foo.txt
would print the version of foo.txt
from 13 March 2008.
When you commit files using subversion, they are stored in a repository, or "repo" for short. We use a subversion server maintained by the College of Social and Behavioral Sciences of the University of Utah.
Install subversion on every machine from which you wish to use subversion. On a mac using homebrew, the command is "brew install subversion". On ubuntu linux, it is "sudo apt-get install subversion". (You need "sudo" access to do this under linux.)
On the chpc server, you don't need to install subversion, but you do need to load the module each time you log on. The easy way to do this is to put the following line into your .bash_profile:
module load svn/1.9.5
For convenience, it is also useful to define a macro that represents the URL of your directory on the subversion server. So I added this to my .bash_profile on all of the machines I use, including the CHPC server:
export ROGLAB=https://svn.csbs.utah.edu/roglab
export MYRL=https://svn.csbs.utah.edu/roglab/rogers
In the second line, use your own name instead of rogers
. This will
define the macro ROGLAB and MYRL each time you log in. (To define them
without logging out and in, type ". ~/.bash_profile".)
Now if you type "echo $ROGLAB", you should see
https://svn.csbs.utah.edu/roglab
Your user name on the subversion server is your last name, all lower case. Contact Alan Rogers for your password.
svn mkdir --username rogers $ROGLAB/rogers -m ""
Here, you would substitute your own last name for "rogers". The last bit (-m "") specifies an empty log message. When you want to add a message to the subversion log, put something between the quotes.
The subversion documentation will tell you to create 3 subdirectories called "branches", "tags", and "trunk". These are useful for software development, but I don't think we need them in roglab. So I'm not going to say anything further about branches, tags, and trunk.
But you will want to create subdirectories to contain the projects that you want subversion to keep track of. For example,
svn mkdir --username rogers $ROGLAB/rogers/vindija -m ""
Move into the directory just above the one you want to work on:
cd ~/group/rogers/data
If the directory tree already exists, move it out of the way:
mv altai altai.bak
Make a directory of the same name on the subversion server:
svn mkdir --username rogers $MYRL/altai -m ""
Check it out
svn co --username rogers $MYRL/altai altai
You should now have two directores, called "altai.bak" and "altai. Copy everything from the old directory into the new one.
rsync -av altai.bak/ altai
Now altai has all the files that were originally in altai.bak. Check that the two directory trees are really the same:
diff -rq altai.bak altai
If there are no differences, diff
won't print anything. Check and
then double check to make sure the new directory has everything you
need. Then delete the old tree:
rm -rf altai.bak
Add the files that you want to keep under subversion control. Don't include any large files (vcf, bcf, bed, etc).
cd altai
svn add README.md
svn add *.legofit *.lgo
You can also add subdirectories, but be careful: the default method
automatically adds the files within the subdirectory. Some of those
may be large files that you don't want to add. Here's the way to add a
subdirectory called orig
without adding its contents:
svn add --depth=empty orig
Then cd
into the subdirectory and continue adding files.
I keep many of my projects on the subversion server and check them out on all the machines I use. When I begin work for the day, my first step is to update the copy on my current machine, so that it contains any changes I might have made using other machines:
svn up
To add or delete a file to the repo:
svn add foo.lgo
svn rm bar.lgo
These commands schedule the file for addition or removal from the repo, but nothing happens to the remote repo until you commit:
svn commit -m "add file foo; remove file bar"
If you omit -m
and the comment, subversion will prompt for a
comment. The commit command will also push any changes to existing
files onto the repo. You can shorten "commit" to "ci" to save typing.
To summarize, begin a work session with svn up
. During the session,
you can modify files, add them with svn add
, or delete them with
svn rm
. When you're done, or whenever you want to save your work,
use svn commit
.
To check on the status of your repo, type
svn stat
In the output, "A" indicates that a file has been added, "M" that it
has been modified, "D" that it has been deleted, and "?" that it is
not under subversion control. Once you commit your changes (svn ci
),
the "A", "M", and "D" lines will disappear, but the "?" lines will
remain.
If you are ignoring a lot of the files in your directory, svn stat
will print many lines that begin with "?", and this can be a
nuisance. It is a good idea to tell subversion to ignore these files.
To do so, create a directory called .svnignore
in the top-level
directory of your project. For example, the file might look like this:
*.vcf.gz
*.vcf.bgz
*.tbi
Sample_*
Here, *.vcf.gz
would match all files that end with .vcf.gz
, and
Sample_*
would match all files that begin with Sample_
. Then cd
into the directory that contains your .svnignore
file and type:
svn propset svn:global-ignores -F .svnignore .
From then on, svn stat
will ignore the files that match the patterns
in your .svnignore
file. If you add lines to .svnignore
, you have
to run svn propset
again.
There is lots of online information about subversion. My favorite book is Practical Subversion.