SVNBackup – Incremental Backup and Restore Utilities for SVN

SVNBackup was borne of some issues I faced while working on a client’s servers. They had previously been using ‘svnadmin hotcopy’ to make their backups, but that had a few of limitations:

  • SVN hotcopy backups needed to be rebuilt when I restored them to a version of SVN built with a different DB back-end. This requires rebuilding the SVN database, which is time consuming and annoying since svnadmin doesn’t give any indication of progress as it works.
  • Because of the size of the hotcopy backups, they were only being run nightly, which could result in up to 24 hours of lost work if the system had to be recovered from backup.

So, I scripted up a way to use ‘svnadmin dump’ and ‘svnadmin load’ for full and incremental backups of SVN that are back-end DB independent. Not only can it be used as a backup system, but it can be used to migrate between versions and builds of subversion. Because of its efficiency it can be run as often as you want, so that your incremental backups catch every check-in.

Current version is .16-beta, and can be found here: SVNBackup-.16B.zip

If you would like to be notified when there are updates to SVNBackup, or submit a bug, please visit the Freshmeat page for this project: http://freshmeat.net/projects/svnbackup

Recent changes include:

#############################################################################
#                                                                           #
# Version .16-beta changes                                                  #
# - Fixed a critical issue where the conf/ and hooks/ directories were not  #
#   being restored to the correct path.                                     #
#                                                                           #
# Version .15-beta changes                                                  #
# - Fixed an issue where moving the backup directory would cause            #
#   svnrestore.pl to see the backup as invalid.                             #
#                                                                           #
# Version .14-beta changes                                                  #
# - Fixed bad logic in the utility file path code.                          #
# - Added a set of common path locations to the search path.                #
#                                                                           #
# Version .13-beta changes                                                  #
# - Improved lock file detection to prevent concurrent execution, and added #
#   a message stating the age of the lockfile if one is found.              #
#                                                                           #
# Version .12-beta changes                                                  #
# - Fixed an incorrect file test operator in svnrestore.pl                  #
#                                                                           #
# Version .11-beta changes                                                  #
# - Added backup and restore of the conf/ and hooks/ directories.           #
# - Preserve and restore the user/group ownership of the SVN repository.    #
#                                                                           #
# Version .10-beta changes                                                  #
# - Added locating utilities from within PATH so that this script should    #
#   run without modification on most systems.                               #
#                                                                           #
# Version .9-beta changes                                                   #
# - Added using /tmp/svnbackup-BACKUPDIR.lock as a lock-file to prevent     #
#   concurrent execution of svnbackup.pl or svnrestore.pl which could       #
#   corrupt backups and prevent complete restores.                          #
# - Added error handling in case the external call to svnadmin fails.       #
#                                                                           #
#############################################################################

Please, give it a try and let me know what you think. If there is interest I’ll set up a mailing list for discussion.

The software license doesn’t require this, but it would be very nice that if you use this code to backup your SVN system that you find someplace appropriate to link back to this page. I could use the Google karma. :)

[ad#adsense-horizontal]

34 thoughts on “SVNBackup – Incremental Backup and Restore Utilities for SVN”

  1. Thanks for sharing the script. Could you the user guide. Ex: how to backup and how ti restore it.

    Thank you very much

    1. In theory, sure. You’ll need a command line install of SVN, including svnadmin. You’ll also need perl. You could probably use either ActivePerl or cygwin.

  2. Hi, I tried svnbackup.pl few times and it works copying incremental changes. That is fine. But when I tried svnrestore.pl on to a standby svn server where there is already a SVN repository that is not up to date, svnrestore.pl gives a message “folder is not empty”.

    Then the very purpose of restore looks not very useful if it can not restore incrementally on to a standby svn server having existing repository.

    My intention is to take incremental backups 4 times a day and restore on the standby server 4 times without need to restore whole repository but just restore the changes, as our repository size runs in to 10 G plus. With your restore method, we need to recreate entire repository of 10G every time !

    Is it possible to provide this incremental restore also ? If I am missing something, let me know

    1. You are right, the tool doesn’t automate incremental restores into an existing repo. That is on purpose; because I can’t easily verify the state of the repo into which you want to append partial data. If you are 100% sure the repo is consistent, you can easily restore the incremental backups by hand. The structure of the necessary commands is clearly listed in the comments in the code for svnrestore.pl. Unless I can figure out a way to validate the state of a partial repo, I won’t be adding a feature for appending partial incremental data to an existing repo.

  3. I am 100% sure the repo is consistent. I only want to append the differences. If you can not 100% verify or guarantee incremental restore, it is ok. You may just give the warning in your recovery script but provide some command line option something like

    svnrestore.pl –incremental which should restore only changes from the backup to the existing repo

    One more thing … If I happen to lose the perl scripts or some of the the files is lost by accident while the gzipped svn data is still available, is the SVN still recoverable from the backup folders ?

    Thanks

    1. svnbackup.pl and svnrestore.pl are wrappers around existing subversion tools that enhance the functionality. Manual restore, without access to perl or SVNBackup is as simple as this:


      # Manual Recovery: #
      # - use 'svnadmin create' to create a new repository. #
      # - use 'svnadmin load' to restore all of the backup files, in order. #
      # ie: #
      # svnadmin create /tmp/test #
      # gzcat 0-100.svnz | svnadmin load /tmp/test #
      # gzcat 101-110.svnz | svnadmin load /tmp/test #

  4. I was brought to SVNBackup for the same reasons you mention in the beginning.

    Because of the size of full backups, we can currently only run them every 24h, which means that in the worst case, we could lose work of the last 24h.

    But what about failing to actually create a backup? If for any reason, creating a backup would fail, because for example of power failure during the time of dumping the repository , it would not be a big problem, because on the next day, a fresh new complete backup would be created.

    With SVNBackup, we can backup more often, but what if a dump fails? What would happen if power fails or the kernel crashes or an nfs share becomes inaccessible during the time SVNBackup is running?

    Is it possible that SVNBackup creates a file such as: 226-231.svnz but because of a power failure the file is empty, or only contains revisions 226 – 228? If such a file is left around, would it be possible than on the next run SVNBackup would start dumping revisions 232 on, so that revisions 229 – 231 never become part of the Backup?

    thanks a lot for the tool and your support!

    1. Thimios,

      You raise a very valid concern, and I believe it is one I can easily address. I will modify SVNBackup so that it writes the incremental backup to a temp file, and then only renames the temp file to its final name once the dump finishes and the return code from svnadmin has been verified. In that way, if there is an error or a kernel panic, the temp file will get over-written with the next backup.

      I think at this point I’m going to have to add email notifications to the script, so that the appropriate contact gets a warning email of a backup fails.

      I’ll try to get that done this week.

      -Chris

  5. Hi Chris,

    thank you very much for the prompt response. The solution you describe sounds really good. It would be great to have the possibility of emailing when a faulty backup run is detected, but I could live without it. Please let me know when you have it ready, I would be happy to help you with testing the feature. Thanks a lot.

    1. Come to think of it, the backup log isn’t updated until after the dump finishes, so as long as you only restore backups that are listed in the log file, then you’d be safe. I’m still going to implement this change.

  6. Hi!

    This is just what I was looking for, and seems quite well designed and thought out.
    I do have one small question; In our environment we would like to run incremental backups on working days and then a full backup on weekends, and then start over with a new set of incremental backups.
    Do you have any recommendations on how to do this?
    I am thinking I may create a cron job to simply delete the svnbackup.pl target folder on Friday evenings, and then kick of the svnbackup.pl job after this has completed. But it seems a bit crude…

    1. I’d wrap SVNBackup in a bash script, and use the date command to generate a fixed date relative to a specific day-of-the-week, and then use that to build your SVNBackup destination directory. ie:


      BACKUPDIRBASE = "/var/local/svnbackups"
      LASTSATURDAY = `date -v-sat "+%Y-%m-%d"`

      CURRENTBACKUPPATH = "$BACKUPDIRBASE/$LASTSATURDAY"
      mkdir -p $CURRENTBACKUPPATH
      svnbackup.pl [REPODIR] $CURRENTBACKUPPATH

      That will generate a date string that is always the current or last saturday. The value won’t change until the next Saturday. The first backup into that directory (on Saturday) will be full, and the rest of the backups that week will be incremental. Then you can prune older backups out of BACKUPDIRBASE.

      1. Thanks Chris! I’ll post my wrapper script once I have tested it, maybe it will be of some use for someone else.

        However, it seems I may have found an issue with svnbackup.pl. On first run, when the target directory is completely empty the script may output an error similar to below.
        * Dumped revision 73675.
        * Dumped revision 73676.
        * Dumped revision 73677.
        * Dumped revision 73678.
        WARNING 0×0000: Referencing data in revision 73655, which is older than the oldest dumped revision (r73666). Loading this dump into an empty repository will fail.
        * Dumped revision 73679.
        * Dumped revision 73680.
        * Dumped revision 73681.
        * Dumped revision 73682.
        * Dumped revision 73683.
        WARNING 0×0000: The range of revisions dumped contained references to copy sources outside that range.

        It appears the range is not used, possibly because the –incremental parameter is never passed to ‘svnadmin dump’ by the script, or maybe because there are check-ins occurring while the dump is running (the script takes some 120 minutes to complete).
        Checking the processes while the script is running show:
        7491 ? Ss 0:16 SCREEN
        7492 pts/1 Ss 0:00 \_ /bin/bash
        30814 pts/1 S+ 0:00 \_ /usr/bin/perl ./svnbackup.pl /home/svnroot /home/svnbackups/svnroot/
        30822 pts/1 S+ 0:00 \_ sh -c /usr/bin/svnadmin dump -r 0:73665 /home/svnroot | /bin/gzip -c > /home/svnbackups/svnroot//0-73665 .svnz
        30823 pts/1 D+ 0:29 \_ /usr/bin/svnadmin dump -r 0:73665 /home/svnroot
        30824 pts/1 S+ 0:53 \_ /bin/gzip -c

        A plain dump (svnadmin dump /home/svnroot | gzip > /home/backup/svnroot_dump.gz) produces no errors.

        1. Something doesn’t make sense here. According to your output, the running svnadmin command is ‘/usr/bin/svnadmin dump -r 0:73665′, but the dumped revision that is throwing the error is outside of the range of 0-73665. What version of svn are your running?

          What happens if you run this by hand:

          /usr/bin/svnadmin dump -r 0:73665 /home/svnroot | gzip > /home/backup/svnroot_dump.gz

          The reason I ask is that the error you listed above is not a svnbackup error, it is a ‘svnadmin dump’ error. I believe it is being caused by explicitly referencing the revisions to be backed up, rather than going with the default. If you get the same error when you use the command above, your repo may need maintenance.

          And, no, I never pass the –incremental argument because I explicitly specify the revisions to be dumped via ‘-r X:Y’.

  7. I am using subversion-1.7.1-0.1 on RHEL 6.2.
    Running the svnadmin dump by hand produced no errors, so I believe my repo is ok. I’ll be doing some more testing to figure out if this was a temporary glitch or maybe a typo on my part.
    I am looking forward to you implementing e-mail notifications for errors, that would be pretty useful.
    Thank you,

    Otto

      1. Chris,
        I ran it with the -r flag when running by hand.

        Later I deleted the entire backup and ran svnbackup.pl again, which produced no errors.
        However, running it again some few hours later produced a warning saying “WARNING 0×0000: Referencing data in revision 73725, which is older than the oldest dumped revision (r73764). Loading this dump into an empty repository will fail.”
        The range for this backup was 73764-73796.
        I re-ran it again this morning, only to get a different warning (several times actually): “WARNING 0×0001: Mergeinfo referencing revision(s) prior to the oldest dumped revision (r73797). Loading this dump may result in invalid mergeinfo.”
        Range for this backup was 73797-73828.

        These warnings seem a bit worrying; I imagined they would at least be the same and not differ from day to day.
        I am currently testing a restore of the entire repo, will know if it worked or not tomorrow.

        1. OK, so the first warning isn’t an issue. It just means that you could not restore that ‘dump segment’ alone, because it has references to an earlier revision. The second one concerns me a bit more, but when I get to the office today I’ll consult our resident DevOps VC Expert and see if that is just informational as well. I’ll also see if he can give me merge steps I can use to create a similar situation, so I can duplicate the warnings.

          Have you tried restoring the full backup to a new directory? Did you get any warnings, and was the repository in the expected state afterwards with the correct merge info intact?

          1. Chris,
            I ran the backup again this morning, and got several warnings of both type 0×0001 and 0×0000:
            WARNING 0×0001: Mergeinfo referencing revision(s) prior to the oldest dumped revision (r73829). Loading this dump may result in invalid mergeinfo.
            WARNING 0×0000: Referencing data in revision 73768, which is older than the oldest dumped revision (r73829). Loading this dump into an empty repository will fail.
            The range was 73829-73937 this time.
            While I can see that getting warning 0×0001 is logical I don’t understand why I am also getting 0×0000..

            I did restore the entire backup to a new empty repository, and it produced no errors or warnings. And running ‘svnadmin verify’ on the new repo produced no errors so I believe it is ok.

  8. Thanks for this handy tool!

    A nice addition would be to add a “maximum number of revisions per file” option.
    I’m just thinking that new users of this tool with already big subversion repositories might run into problems with files getting bigger than the supported file size the first time they run the tool (creating 0-824575.svnz…)

    Keep up the good work!

    1. Howdy Hans,

      It’s been a while since I’ve worked on a machine with a 2G file limit, so I had missed that particular weakness. Do you think it would meet your needs if I added support for split?

  9. Hi Chris,
    Thanks for sharing this script- I was looking for the very same one but unfortunately couldn’t use it as am using Visual SVN on Windows 2003 and the backup script is giving me an error saying :

    ‘which’ is not recognized as an internal or external command,
    operable program or batch file.
    Unable to find svnlook in the current PATH.

    I tried to change the $ENV{‘PATH’} . to look like
    $ENV{‘PATH’} .=’C:\SVN\Bin’;

    but I saw no change in the error message.

    Any tips that you can suggest to get this working on a Windows Server would be helpful.

    Thanks

    VC

    1. The issue is that since you are running in a windows environment, you don’t have the command ‘which’. You could fix this by replacing this block of code:


      ## Locate the following utilities for use by the script
      @Utils = ('svnlook', 'svnadmin', 'gzip', 'gunzip', 'tar', 'chown');
      foreach $Util (@Utils)
      {

      ## Populate $UtilLocation{$Util} if it isn't set manually
      if ( !(defined($UtilLocation{$Util})) )
      {
      ($UtilLocation{$Util} = `which $Util`) =~ s/[\n\r]*//g;
      }

      ## If $UtilLocation{$Util} is still not set, we have to abort.
      if ( !(defined($UtilLocation{$Util})) || $UtilLocation{$Util} eq "" )
      {
      die("Unable to find $Util in the current PATH.\n");
      }
      elsif ( !(-f $UtilLocation{$Util}) )
      {
      die("$UtilLocation{$Util} is not valid.\n");
      }

      }

      With something like this:


      ## Set the path for utilities used by the script
      $UtilLocation{'svnlook'} = 'C:/SVN/Bin/svnlook.exe'; #I am pretty sure you still have to use / since \ is a special character in perl.
      $UtilLocation{'svnadmin'} = 'C:/SVN/Bin/svnadmin.exe';
      $UtilLocation{'gzip'} = ''; ### Set this to wherever your gzip utility lives
      $UtilLocation{'gunzip'} = ''; ### Set this to wherever your gunzip utility lives
      $UtilLocation{'tar'} = ''; ### Set this to wherever your tar utility lives
      #$UtilLocation{'chown'} = ''; ##unused

      Mind you, I don’t have a winblows box at the moment, so I can’t test that this is ALL you need. But, this will get you past the problem of execing out to run the ‘which’ command.

  10. Hello!

    I would like to thank you for your script. It is more than usefull for me.

    I have a question though- While the svnbackup.pl is running, are users abble to update/ commit repos?

  11. Hi there,

    I am trying to take the backup of the Visual SVN server using the svnadmin hotcopy command, but I am UNABLE to use it. I have done all the googling, but it did not work at all. Please help me on this.
    Client: TortoiseSVN
    Server: Visual SVN server

    Thanks very much!

    Regards,

    Ashish David | Senior Engineer – Build and Release | New Business and Product Development | Smart Chip Limited | Mobile: +91 9873449265 | O: +91 120 4072600 Ext. 1037 | F: +91 120 4072798 | Email: Ashish.Dayal@smartchiponline.com | http://www.morpho.com/india | A: D-216, Sector 63, Noida (UP)

  12. How about SVN repositories with multiple projects? I have:

    /repos/proj1/trunk , /repos/proj1/tags , /repos/proj1/branches
    /repos/proj2/trunk , /repos/proj2/tags , /repos/proj2/branches
    /repos/proj3/trunk , /repos/proj3/tags , /repos/proj3/branches

    all of them under the same repository. Is it safe to make a for loop and run svnbackup.pl to each one? All of them have the same revision, because they are at the same repository.

    Thanks

    1. What you are describing is how your repository looks to the client, which is way different than how the data is stored in the repo on the server side. If you look at the repo storage on the server, you’ll see that those proj directories don’t exist because that client-side structure is encapsulated in the SVN database.

      SVNBackup is basically a wrapper for the “svnadmin dump” command set. It needs to be run on the server side, against the repo directory.

  13. Hi Chris,

    Thanks again for a great script, it has now worked flawlessly in production for a little more than a year :-)
    I do have a question though:
    What return codes do svnrestore.pl return when run? I am calling it as part of a larger restore script, and I need to detect if a restore was successful or not.

    Thanks,

    Otto

  14. Thanks for sharing these scripts, they work quite well.

    I’m using Cygwin to get this working on Windows. Simply install the perl and subversion modules in Cygwin and run the svnbackup/svnrestore Perl scripts.
    I also use Visual SVN server.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>