Free Newsletters

   All InfoWorld Newsletters
Google Search » Database Underground | Sean McCown » Are Your DBAs Working for You?

July 25, 2006 | Comments: (0)

Are Your DBAs Working for You?

OK, for a while now I've been on this kick about how companies ought to look out for their people, and recognize the talent they have and keep and nurture it. So, this one is for the IT managers out there, but DBAs should read it too just to know what they're being graded against.

Here's the problem:
You have a DBA, or even a team of DBAs on staff. Your main production system goes down, and it's now time for them to spring into action. After assessing the problem, they decide that the DB is completely shot and needs to be restored from backup.

Here's your quiz:
Are they telling the truth, or do they just not really know what they're doing? Could the system have been brought back online much faster? Was it really necessary to restore and possibly lose the latest transactions? Did you lose any transaction, and if so, is it because you just couldn't help it, or because they didn't set it up correctly to recover from disaster?

There are other questions to ask yourself, but you get the idea.

Now, most of you probably don't know the answers to any of these questions, and it's partially your fault. More on that in a minute.

In the example I gave above, it could have really gone either way. Here are just some of the things that could have happened.

1. DB could be in suspect mode, in which case it could be as simple as a drive outage, but it could easily be that the DB outgrew the drive in the middle of a huge transaction and there wasn't enough space to rollback. Does this require a restore? No.

2. DB could have corruption. In this case, it really depends on the level and cause of the corruption. It also depends on how long it's been corrupt. There are ways to fix the corruption both with and without data loss. However, in this case restoring could actually make the problem worse because you could restore the corruption instead of fixing it.

3. If you restore for whatever reason, did your DBAs think ahead and test the backups? Did they leave you in a position to be able to backup the tail of the log so you don't lose transactions?

I think I've made my point.

So how is it partially your fault that you're in a bad position?
The hows and whys are really important, and it's important for you to not just accept your DBA's word. When there's a situation, ask for details. Ask them what the root cause of the problem was, and how it could have been avoided. Get them to explain everything to you, and if you don't understand, don't stop until you do. I do a post mortem after every outage. I've got a form I use that outlines the players, when it happened, when the DBAs were informed, when it got fixed, what the cause was, how it can be avoided, etc. This is very important in understanding your environment and can help you recognize major shortcomings in your plan.

You should also insist that your DBAs go through practice runs of recovering from a disaster. Make them restore the system from the ground up. Make them test their backups and their restore scripts. It's not only necessary for seeing where you stand as a company, but it's also to see where your DBAs stand in their professional development.

I find that most DBAs don't push themselves to study or practice to help prepare for their next disaster. It's important to familiarize yourself with the techniques involved in recovering systems, and if you don't know them, ask someone. There are plenty of forums, MVPs, books, etc out there to get this knowledge from.

Hell, one of my favorite methods of all time of learning techniques I don't know is to surf the newsgroups. There are newsgroups for every DB out there, and if you just cruise them a couple times a week, or for 30mins/day, you'll be amazed at what you'll come up with. Somebody posts with a question you've never heard... just pretend like it's happening in your environment, and research it and try to find the answer yourself. You can also see what the other users are saying about it. You can't think of every situation yourself, so the newsgroups come in handy for giving you ideas. So once a day I would go into a newsgroup and pick a thread. I would then pretend that it was something someone had asked me at work, and I would try to solve it myself. You'll be surprised how helpful that is.

Ok, that's all I've got this time. I hope I helped some of you managers realize where your risk lies and how to assess how good of a job your DBAs are doing. Push them to become better, and make them explain their actions. I find that quite often DBAs make decisions without clear goals in mind. You have to know what your goal is before you know if you've achieved it.

OK, this time I'm really done... honest.

Posted by Sean McCown on July 25, 2006 08:15 PM


RATE THIS ARTICLE:





 

  •  
  • COMMENTS





I find that most DBAs don't push themselves to study or practice to help prepare for their next disaster. It's important to familiarize yourself with the techniques involved in recovering systems, and if you don't know them, ask someone. There are plenty of forums, MVPs, books, etc out there to get this knowledge from.

Really? Where? How many DBAs have you known? And which db are we talking about? SQL Server? Oracle can indeed be a daunting task if you don't have a proper backup plan. But good thing is that Oracle Databases are managed by real DBAs (well most of the times).

Posted by: Tarry Singh at July 27, 2006 10:37 AM

It's clear that you're not and have never been a DBA. That's not a criticism, just an important fact.

Modern DBMSs are unique animals in the world of software. Oracle, for example, has been called "a very sophisticated cache manager that speaks SQL." They don't work like any other software and how they do work is extremely complex. Microsoft has moved mountains to make its SQL Server almost as easy to administer as any other software. They've done it by hardwiring the software such that it works okay for operational systems (lots of reading and writing individual records) as well as decision support (mostly reading large numbers of records at a time). Oracle, DB2, etc. however have always allowed far more fine-grained tuning in order to optimize a system for the kind of loads it is used for. With such fine-grained control, diagnosing a failure and what to do about it becomes far more complex.

Microsoft databases however have more limited failure recovery options due to the aformentioned administration simplification. A full restore is going to be the option of choice more often with SQL Server than with Oracle, etc.

Then there are political factors.

Managers always want the system up ASAP. There is an inverse relationship between speed of recovery and minimization of data loss. The fastest way to get a corrupted database back online is usually a full restore initiated quickly, with little if any diagnosis of the underlying cause. The more time spent diagnosing and determining the minimal restore strategy needed, the longer it will take to be back in service.

There are ways to mitigate these realities, but they are all expensive and it's hard to convince fruggle decision makers of their merit until a catastrophic failure occurs that threatens the business in a major way.

Posted by: Robert Watson at July 27, 2006 09:24 PM

Thinking, researching 'what if' questions, studying for the worst case scenario, all great advice, and for any IT specialist, even the generalist technician I've been most of my career.

But in the real world, I'm tasked with being 'productive' continually. From being 120 miles North at 0800, to finishing at 6:30pm an hour from home, a field technician has little spare time. And those techs that work for 'one' client, many have a solid 60 hours of to accomplish in their 40+ hour week. I suspect many DBAs also have full plates.

How do you find time to practice restoration, resolving corruption problems as quickly as possible, and testing all this if your boss has you busy 5X8?

You do give great advice, but how about which way to dodge the frying pan when your boss finds out you've never actually tested a full restore...?

rick

Posted by: rick at August 9, 2006 02:30 PM

Holy smokes! Finally, somebody explicitly, deliberately addressed the elephant-in-the-living-room-that-defecated, which most people studiously avoid discussing!

You have identified a symptom of the deeper problem: antipathy towards mathematics and scientific competence that pervades US culture in general, and corporate management culture in the US in particular. (See Edsger Dijkstra's essay, "There is Still a War Going On," at the website which compiled his essays, at http://www.cs.utexas.edu/users/EWD.)

Posted by: Richard S. Stewart at August 31, 2006 11:23 AM

Technology White Papers

 

InfoWorld Technology Marketplace

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
» BUY A LINK NOW

Sponsored Technology Links