Wednesday, February 24, 2010

All in a day's work

This is going to be another long story, but I promised (ok, threatened) someone I'd put it up.  The follow on story about our activities on our day off will be even longer when I get around to posting it.  It's a really interesting tale about the "Mystery Spot" and getting caught up in a land slide.  But I digress.  So without further ado, here goes.

I’m in California with a couple of technical guys from JJWILD for the first phase of a very large conversion project. This is a project that had been almost two years in the planning. The customer was ripping out and replacing the heart and lungs of their Information Technology infrastructure. This project has overcome countless hurdles to get to this point. There were technical obstacles. Not only were we replacing everything, but we were fundamentally changing the way in which this customer delivered Information Technology services. We were leapfrogging two generations of technology. There were financial considerations. After all, how and where does a hospital come up with almost two million dollars to spend on a project that doesn’t generate any revenue in return? And most importantly, there were political obstacles. Not everyone in the IT department was happy to see us out there doing this project. Some folks were annoyed because they felt they should be doing this project, not us. Some folks didn’t want this project to happen at all. It was reported that one senior person had vowed that they would do everything in their power to see that we failed.

To this point we had successfully conquered all obstacles that stood before us. We made Board level presentations to get approvals. We justified the costs to the financial team. We addressed all the technical concerns. We covered all the bases. We developed designs, put together plans, assigned roles, assigned responsibly, developed a timeline. We put together a bullet proof project plan. And most importantly, we rolled over anyone that stood in our way. We got the support of the CIO, CFO, CEO and the Board of Directors early on. Those folks that wanted us to fail, well they were helpless. All they could do is lurk in the darkness on the fringes of this project and grumble.

The day of the actual conversion arrived. We had scheduled 48 hours of downtime. Downtime for a hospital is a major problem. It meant that the clinical departments had no access to the computers. This had the potential to impact patient care. As a result, scheduling downtime was difficult. Typically, down time is kept to a minimum and when you do get it, it’s got to be as short a time as you can possibly get away with. We got 48 hours, which was unheard of. We had to make a commitment that once we started no one was going anywhere until the hospital was back up and running. So we got cots, and food and lots and lots of technical support for not only the Hospital staff and JJWILD, but also all the vendors involved.

At midnight on the chosen day we took down the systems and began our conversion. We had conducted several dry runs so we knew the processes and procedures we needed to do and had some practice doing them. We pre-cabled and preconfigured everything we possibly could prior to shutting down the first system. When that first system came down, everyone jumped into action. By 4:00 AM we were finished. We were standing around looking at each other saying things like, “I wonder what we should do for the next 44 hours?” I kept saying things like “See, it’s just like painting. The magic is in the preparation.” By 5:00 AM we were all sitting at Denny’s finishing our breakfast.

Eventually the JJWILD team went back to the hotel to sleep. When we got up, the great “well what are we going to do now” debate started. We had a free day and a half. We had nothing to do until Monday morning. So we went site seeing. The adventure that was our day and half off will have to be addressed in a separate story so I’ll skip ahead a little.

We got back to the hotel at about 11:00 PM. When I got to my room the message waiting light was flashing on the phone. When I checked the message, it was the JJWILD on call tech support person. He’s having a meltdown. He’s freaking out. Seems he got a call about 4:00 PM that the hospital was down. He’s been trying to reach us since 4:01 PM. It’s now 11:00 PM. He’s been taking angry calls from the customer for 7 hours. He can’t fix anything remotely and he can’t get us. I called him back and got the full story. Now I’m freaking out. I called my work phone. I have about a dozen messages from the customer, each one a little more irate than the one before. After a few minutes the cell phone started going off. One of the things we did on our day off was we took a cruise down the Pacific Coast Highway. Apparently there is no phone coverage on the PCH. We started getting the queued up voice messages about 30 minutes after we got back into range. Now, I’m way past freaking out. This hospital has been down for almost 24 hours. I’ve gotten voice messages from the technical team, Network Manager, Systems Manager, IT Director, and CIO. The CIO has gotten to the point where he’s threatening to have his CEO contact my CEO. The CIO is beyond mad.

I grabbed my technical guy (I know I’m not suppose to us names, but can I just call him Mike?) and headed back. Mike had been driving all day. I took the keys for this ride. I knew we’d be going fast. I volunteered to take the ticket(s). I jumped behind the wheel and headed out. I put the pedal to the metal and went for it. Normally it was about a 40 minute drive to the hospital. We made it that night in considerably less than 30. At one point as we crested a slight rise in the road, all four tires came off the ground. We pulled into the parking lot on two wheels. We stopped with a screech of tires and a cloud of smoke.

When we get into the data center there are about a dozen people there. We walked through the door and immediately started taking heat. Where were we? Why didn’t we answer the phone? How could we leave with the hospital down? etc etc etc. We asked what was going on. Everybody started talking at once. I looked around and saw the CIO standing with the IT Director. Neither one is talking and neither one is smiling. I asked Mike to see what was going on and I drifted over to talk to the CIO. The CIO started telling me a very angry story. Apparently, the hospital IT team knew the system was down at 8:00 AM. At 10:00 they had called him. He told his people to call us immediately. He got an update at 11:00 and was told at that time that they had not called JJWILD yet. Again he instructed them to call us. This went on all day up to the point that he showed up in the data center and MADE them call us. That was 4:00 PM. And once they did place the call, we couldn’t be reached. He was also looking for his own lead technical person. When they found him, his only response was “That’s JJWILD’s problem, let them fix it”. The hospital’s number one, most senior technical resource didn’t even come in. He just dumped everything on JJWILD and stopped answering the phone. I don’t know who the CIO was madder at, us or his own people.

So Mike’s working with the tech folks. After a little confusion Mike finally gets the story. The whole hospital is not down, it’s just one system. (There are well over 100 by the way). A little less pressure, but not much. Then Mike found out that the system that was down was the one system the JJWILD team hadn’t touched. The hospital IT team had done this one on their own. At this point, we’ve been in the Data Center about 5 minutes. Mike is sitting in the floor, behind a rack of computer gear trying to figure out what’s wrong. He’s got about a dozen people yapping at him. And they are yapping all kinds of crazy and conflicting things. I’m trying to defuse things with the CIO. All of a sudden from behind the rack I heard Mike yell “SHUUUTTTTTTTT UUUUUUPPPPPPPPPPPPP!!!!!!!.” The whole place went quiet. There’s a slight murmur going on about being told to shut up, but most people are now being quiet. The CIO looked at me and I can see his blood pressure rising. A line of red started rising from his collar and was progressing to the top of his head. I could hear his teeth breaking. The next thing I heard was Mike very loudly ask “There are two ports on the back of this system and only one cable. Did anyone try the other port?” Have you ever been in a data center? They are loud. There’re fans, and UPSs and air handlers all running, making all kinds of noise. Even devoid of people data centers are loud.  Now all of a sudden in this data center you could hear a pin drop.  Even the equipment seemed to stop making a sound. People started pawing at the floor with their toes and looking at the ceiling. No one said a word.  Mike asked again, “Did anyone even TRY the second port?” No one answered. So Mike moved the cable to the second port. When he did, the system immediately popped up and started to work. Now the hospital’s IT team was looking for a place to hide. They are scattering like cockroaches from the light.

The CIO looked at me and asked “What did he just do?” Mike said all he did is move the cable to the other port. The CIO asked how Mike knew there even was a second port. Mike responded that is was right beside the first one. The CIO is now talking through his teeth. He thanked us for our help and told us we could leave, that he’d talk to us on Monday. We were in the data center less than 10 minutes. As we were leaving, the CIO was gathering up his folks for a little chat.  As we were walking out the door we could hear him start.  It was something along the lines of "We were down for 20 hours and all you had to do is move one damn cable?........"

On Monday, we got a tiny little slap on the wrist about being hard to reach on Saturday. The bulk of the conversation was centered on what we could do to help him address issues with his team. He couldn’t understand why his folks were so reluctant to call us or why none of his own people thought to try the other port.

Now here’s the funny part. About a month later I’m in the Data Center with the IT Director. We’re discussing what had happened during the cut over. As we talked we walked over to the system that had been the problem. As we were looking at it, the IT Director noticed that the console was flashing an alarm. She pulled up the alarm and it said that network connectivity had been lost and to check the cabling. It even gave the port number.

Oh yeah I almost forgot, the lead technical resource for the hospital, the one that wouldn’t help on Saturday? That’s the guy that was telling people he would do everything in his power to see us fail. You’ll never guess how long he kept his job.

No comments:

Post a Comment