Friday, October 13, 2006

Riddle me this:

All week our net team of 4 has been working on a stumper of a problem. In fact, I don't think Microsoft knows where to test and focus their attention. We have been talking to Microsoft for approximately 4 hours for the last 4 days and the only thing we have determined is that our registry is getting corrupted, our domain is working correctly, and there doesn't appear to be a virus or other malware installed on the computer. The problem started with 1-2 issues on Monday morning and escalates to 7 identical computers by Thursday out of a total of 577 workstations where 250 are identical hardware and all are RISed with similar images. The most unique group has 6 computers and only 1 was affected.

Here are the details:

After approving September's patches on Saturday for an install on Sunday at 9am, I got my first call of a netlogon service problem. The user was not able to logon because the netlogon service was not able to start. A second blank error message pops up with a big red "X" and a Okay button before sending you back to the ctl-alt-del screen. Logging in as that user, my domain account with local admin rights, or the local renamed administrator account all produces the same results.

Going into Safe Mode, Safe Mode with Networking, Safe Mode with Command Prompt. All cause a hard reboot right when the graphics card should take off. The last Known Good Configuration gives us the same results as starting Windows normally.

My first diagnosis was a roached OS and I re-imaged it. I found out late that someone else ran into the same scenario on Friday (before approval of updates) and solved it by re-imaging.

Now we start to get suspicious, when we start seeing our third, then fourth bad machine on Monday when we are able to keep a couple for studying (that's when someone figured out that Debug mode works) and start our call with Microsoft.

Booting into the Debug mode allows for normal logins of local administrators and domain accounts

I spent an hour making sure that memtest and Dell utilities determining that the hardware was okay.

To be continued...