I recently worked with Ray Palumbo at Camelback. He was experiencing problems with his differential backup. Every time the backup executed, throughput on the server got terrible. On busy days, this caused super long calls and the sales points would go off-line. To prevent this from happening, Ray was forced to suspend the differential backups. This meant he was relying solely on his nightly full backups that occur at 2:00 AM. Naturally, Ray found this to be an uncomfortable situation.
Another interesting symptom of the problem was that the first differential backup of the day took 3 to 4 minutes and was ~ 8GB in size! A Differential backup is supposed to backup what has changed in a database since the last full backup - so I was expecting something that took only seconds to complete and was much less than 10 megabytes (since not much changes in the database between 3:00am and 8:00am). As the day progresses, the size of the differential grows - but even the largest I have seen have been < 100 MB, typically.
So - first things first - even if the differential backup took 3-4 minutes, why was it bumping all the sales points off? Well, it turned out that Ray's LDF file was on the same drive as the destination of the backups. That drive happened to be a standard IDE drive, and throughput was not stellar. What I learned here is that even though the SiriusSQL database was using the Simple Recovery model (meaning the LDF file never really grows), the LDF file gets used anyway during the time that inserts, updates and deletes happen. The transactions get committed immediately (that is why the file doesn't grow), but the file does get used - for every transaction from every sales point. Add an 8 GB backup to that same drive, and it was enough to cause problems.
So, to solve this issue, Ray moved the LDF file onto his speedy SCSI mirrored data drive. We left the backups pointed to the slower IDE drive.
Hopefully the sales points getting knocked off was solved - but still - the 8 GB file was a nagging problem. The "Ah-Ha" moment occurred when we looked at the maintenance plan. The maintenance plan did the integrity check a 1am, the backup at 2:am and the reorganize indexes at 3am. Since the differential only backs up what changes since the last full backup - the 8 GB of changes must be coming from the reorganize indexes part of the maintenance plan!
So - the solution was simply to reverse the order of the backup and the reorganize indexes. Now the plan does the backup as the last step at 3am. We tested this by performing a full backup, and then doing a differential backup a few minutes later. The differential backup finished in .51 seconds (half a second) and was about 8 megabytes. Now THAT'S what I'm talkin' about.
So the big two lessons here: 1. Put your backups on a different drive than any of your DB files if possible, and 2. Make sure your Full backup is the last operation in your maintenance plan.
Ray has happily bumped up the frequency of his differentials to 30 minutes, and Camelback's sales points can stay online. WaHoo!
Thanks to Ray and Camelback for helping to figure this one out.
- Sean
Friday, March 2, 2007
Subscribe to:
Post Comments (Atom)
3 comments:
Your sticktuitiveness on this issue is what is really impressive. Most techs would just refer the person to Microsoft, who would open an incident and never figure it out.
Hey there,
Just wanted to drop a quick thanks. Without your post, it would have taken me much longer to figure out. Backup and Indexing schedule swap ho!
Thanks so much! Doing a Google search left me with this problem for weeks. Somehow, I never thought to put the Re-Indexing and so on before the Full Backup, and was getting frustrated by the huge Transaction Log backups and Differential Backups. Also, honorable mention to the more experienced SQL Server DBA that set this up wrong for us in the first place.
Post a Comment