Tuesday, September 25, 2007

Life without email

Due to the fact that a Dell engineer managed to corrupt our exchange database yesterday, we have been without email all day, and its not looking good for getting it back any time soon.

We have tried restoring from a backup but that appears to be corrupt as well as it wont mount, so we are running Eseutil in the fix mode. The database is about 60gig we set the fix running at about 10 this morning, and at 5:30 I dont think its finished yet. Then we have to run a defrag as well, so that's going to take another 7 hours or so. People are going to be pissed.

It got me to thinking I might start to research collaboration software, it's amazing how much of the Business has been effected by having no email. The amount of people that distribute documents, use shared calenders, plan meeting etc with Exchange means its no longer a case of email/exchange being classed as a non business critical system, witch we kind of do now.

Using something like sharepoint might be the way to go, that uses calenders, blogs, Wiki's and file stores and is fully updatable by the users.

Problem is going to be weaning them off the Outlook/Exchange way of working.

Monday, September 24, 2007

I cant believe we pay these people

We have had a problem with our mail server losing disks, even after new ones have been put it. The disks just went to error as soon as we put them into the server, so we had a Dell engineer out to replace the backplane.

He turned up, replaced it and put a new disk in but the disk didn't come up and start to re-build right away. Instead of thinking why it didn't start to re-build the array onto it, he seamed to take a wild guess and click the online option. The disk then said it was re-initialising, witch didn't seam right but he insisted it was. He then left.

We came to bring all the mail stores back online, and lo and behold we got tons of errors about checksums, invalid data etc, so we rang Dell tech support again. The bloke told my boss that he should not have on-lined the disk, as that just brings it straight into the raid5 array without putting the data on it first. There is a re-build option he should have used.

So, we are now going to have to blow the whole virtual disk away, and restore the data from this afternoon when we took a backup.

If i went round blowing peoples data away I would get sacked, so I hope the bloke who came on site and did this gets a good ass kicking!!

Monday, September 17, 2007

Melting server room

Its been a bit of a shit day today. The people who look our building decided to do some work on the air conditioning systems today. Normally this wouldn't be a problem, except they didn't tell us what they were doing, and the only air con that seamed to be affected was in our main server room!!

It started to get mightily hot in there, at one point it got up to 38C, we had various fans in there but nothing seamed to be getting rid of the hot air. The best thing seamed to be getting a long foil tube, putting one end over a large fan situated at the back of the racks, and the other end out of the door. This got rid of a lot of heat until the portable air con's turned up.

We managed to get away with very little initial problems, the watchguard firewall overheated, switched itself of and lost its config, various server lost disks and everything was reporting heat warnings.

The only complete failure was a Dell poweredge 750 server that was running websites for our editorial system. This went down, and refused to power up again while it was still in the rack. We moved its services onto another server, and when it got a bit quieter we took the server out of the rack. When we got inside it, we found a section of the main fan had broken off completely, and fan blades had been tossed all over the server!

So we got the fan out of the housing, and it looked as though the bearing had gone, so the fan was lose in the casing. Presumably the blades had hit something and this is how they snapped off.













We then looked into Dell IT assistant and one of its last temperature warnings was at 88C, no wonder a couple of hours later the CPU was still to hot to touch.






So let this be a lesson, always make sure your air con and its backup units are in working order!

Monday, September 10, 2007

Auditors (why would you do that job)

Joy of joys, last week we had the auditors in, for 3 days no less. Now call me uneducated, but why would anyone want to do that job. One of them said to me (while looking for paperwork about a user account that never existed) that he 'didn't enjoy doing this', problem was he had a big grin on his face when he said it. Hmmm, i'm sure he wasn't lying at all.

The good thing was we had covered most of the things they wanted, so they started to get desperate (they aren't doing there jobs if they cant find anything wrong). One of the 'problems' was apparently the windows on the server rooms were a security risk, we're not sure why though. We found the best thing to do was to give them something small to get there teeth into and then leave them alone. The were chewing over 'missing documentation' for a small software version change for about 4 hours!

Because we knew they were coming, we tried to go round as many as the servers as we could and make sure they were fully service packed and patched. Just for a change, I managed to hit a couple of problems, this time it was with SQL reporting services not starting when I rebooted the server.

The first thing I noticed in the event log was a shed load of errors in the system log looking something like this:

The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID

{GUID-Numb}

to the user \mossService SID (S-1-5-21-). This security permission can be modified using the Component Services administrative tool.


I hunted the net, and this article solved this problem:
http://geekswithblogs.net/mhamilton/archive/2006/12/19/101568.aspx

After a doing as suggested in this article, the DCOM errors stopped, but the SQL reporting service still wouldn't start, all i was getting was the usual 'did not respond in a timely fashion' type of error in the system log.

So another search led me to this article:
http://geekswithblogs.net/etiennetremblay/archive/2005/11/03/58989.aspx

Applying this reg key and rebooting then brought the service back to life. Reading more about this it seams that its just a timing issue. Why should applying a couple of patches cause a server without any problems to run its services up slower than it used to? Most of the things I found said this should only happen on slow servers, not after dodgy patches.

Saturday, September 01, 2007

Fun With Faxs

Been doing some work this week trying to get a faxing solution working using our VOIP (voice over ip) systems. We were trying to come up with a solution we could use as a group, that would allow users to have there own number for clients to send to, and the fax would be delivered to the user as an email.

We read up about fax over IP, it does exist but apparently its not very good, the line has to be clean with no noise otherwise the fax machines cant handle the call. Due to the fact we would be using the VOIP network we decided not to pursue this route, and to attach an ATA device to the VOIP network and then to a normal analogue modem. The ATA just lets you use normal analogue phone/faxs/modems on the VOIP network.

What we then wanted to test, was if we could get caller ID information out of the modem, the idea being that if caller ID was working then we should be able to get out the number the customer has called, so we know where to route the email.

Using a Hyperterminal session to the modem, we called the modem. If the caller ID is being passed then, between the first and second ring the caller ID info should pop up. This is where the real fun began.

The first few times we rang it nothing came through. Doing a bit of research, quite a lot of modems didn't support caller ID in the UK, and most of them didn't have the feature turned on by default. Looking round the web, we found a couple of commands to try on the modem to turn on caller ID, they were

AT#CID=1
AT%CCID=1
AT+VCID=1
AT#CC1
AT*ID1

After now managing to turn it on, we rang the modem, and hey presto the caller id came through! Just as we thought we were getting somewhere we found another problem. If we rang the modem, hung up and rang it again within about 5-6 seconds the caller id didn't come up. Wait about 15 secs and it worked again. We tried this with two different modems and the same things happened again.

So the caller id was'nt very reliable going over the VOIP network and through the ATA box. We then got the VOIP guy to try and get it to send the called number anyway, he turned it on, we sent a fax but the info didn't come through to Windows Fax services. At this point we started to realise our idea might not work very well.

Next week we will probably play about with it a bit more, but looks like we are going to have to do it another way. The other ideas we have or try and use Fax over IP to see how well it actually does work, you get get some modules for Hylafax etc so we might try that. The most likely option i think we will do is use a ISDN30 , attach the lines to that and put a card in the server. We know this option works as one of our other centre's currently uses this method.