Free Newsletters

   All InfoWorld Newsletters
The Deep End | Paul Venezia » TAG: Systems

March 27, 2008 | Comments: (0)

The 45nm Xeon 5400 series in the lab

Yesterday, Intel announced the newest incarnation of their quad-core Xeon CPUs. The 5400 series is a low-voltage chip designed for tight spaces such as blades and 1U servers. On the very day of the announcement, I was finally firing up a test box running a pair of 5420s. These are 2.5Ghz quad-core CPUs with 12MB cache built on the 45nm die, and I'm running them in an Intel server chassis on the S5000PSL mainboard. These chips aren't designed to be speed demons -- rather, they're designed to be lighter on the power budget while still offering decent performance.

I haven't had a lot of time for testing and benchmarking yet, but I do have some results from basic tests. Some of these tests are threaded and make use of the eight total cores in the box, but others are single-threaded and highlight the performance expected from that type of application. Additionally, most of these tests also reflect disk I/O performance. Essentially, these are real-world tests, not just CPU tests.

All tests were run on my test system, which has 4GB in two 2GB FB-DIMMs, two quad-core Xeon 5420LV chips at 2.5Ghz per core, two Seagate SATA II drives in a hardware RAID1 array built with the Intel embedded RAID controller (which is an LSI chipset) on the S5000PSL mainboard. The OS was a fully-updated CentOS 5 build.

MySQL
I ran all tests in the sql-bench suite against the local host using sockets. All tests completed in 1435 seconds (23.9 minutes).

LAME
I used LAME to encode an 838MB WAV file to MP3 at a 256k bitrate, VBR 2. This is a single-threaded task, and completed in 404 seconds (6m 44s)

MD5 Sums
Another single-threaded task, but a common one. Calculating the MD5 sum of the same 838MB WAV file took 2.6 seconds.

Compression
I ran bzip2 on the same 838MB WAV file. The times were 182s (3m2s) to compress, and 77s (1m17s) to decompress.

Posted by Paul Venezia on March 27, 2008 10:49 AM



March 17, 2008 | Comments: (0)

Full circle: How Microsoft is trying to eradicate email

After all this time, all these spams, all the complaints from all over the globe, I can only come to one conclusion: Microsoft is trying to kill email.

Let's take a look at some facts. Spam levels are as high or higher than they've ever been. From my own personal experience, I can say without a shadow of a doubt that 99.9 percent of all email coming to my mail server is spam. That's tragic all by itself, but it's been that way for quite some time now. I have written and documented the severe steps that I've taken to reduce the problem, but the fact remains that hundreds of thousands of connections are made to my mailserver every day, trying to sell me v1@gr@!, inform me of my incredible good fortune in some foreign lottery, or tell me that really need to buy stock in some company nobody's ever heard of.

Hundreds of thousands of connections, coming from thousands of hosts. What are those hosts anyway? The vast majority of those hosts are exploited Windows systems. They're zombies run by botnet operators. Their owners are probably completely clueless to the maelstrom that has engulfed their little Dell desktop. It's just "slow".

There are millions of these systems out there, according to an article from USA Today. Millions.

The mainstream media consistently use the term "computers" when they make forays into this realm. Yes, they are computers, but they're not just any computer -- they are all running Windows. All of them. Let's not mince words here: Botnets are comprised of compromised Windows systems. Thus, Microsoft's massive security failures are at the very core of the spam problem.

Yes, there are still spammers out there that use specific servers and subnets to send their trash, but they're relatively easy to identify and stop, either by the ISP, or through filtering at the client side. Connections from millions of unique systems from all over the globe are much harder to stop. Some of the ways that spam filters try to stem this tide is by identifying subnets assigned to residential cable and DSL providers, and blocking those IP ranges. That's like bringing a sledgehammer into surgery, but it can be effective -- so effective that it blocks legitimate communications from people running their own servers, and hundreds of companies using cable and DSL connections for their business. The subnet allocations caught up in these traps aren't necessarily accurate, and they can cause email to simply disappear at worst, or consistently be marked as spam at best.

Speaking of email simply disappearing, this brings me to my next point about Microsoft's apparent attempt to kill email: Hotmail.

I've had a Hotmail/MSN/Live.com email account for awhile now, and it's been relatively spam-free. Of course, that address is not published anywhere, and I hardly ever use it, so I would expect that to some degree. However, some tests I ran over the weekend shed some light on some of the ways that Hotmail/MSN/Live.com handle spam: They apparently are simply deleting inbound email with no bounce messages, no flags, no notification -- nothing.

I can replicate this at will. When I send an email from my mailserver (located on a commercial circuit) to my gmail.com account, live.com account, and other personal accounts, they all arrive -- except to my live.com/Hotmail account. It simply never appears, and no bounce message is ever seen. If I send myself an email from my live.com account, it arrives speedily, and my reply is delivered back to the live.com account almost instantly. But if I then write a new message to the live.com account, it never appears, even though it came from an address that I just emailed.

Thus, Microsoft is simply deleting legitimate emails. Why would I bother using such a service? It's like buying a car that will only start once in awhile, or a refrigerator that keeps the soda cold, but lets the milk go bad. It's useless.

I'm not alone here, either. This thread at MozillaZine goes back to 2006, and describes these exact problems in excruciating detail, among others. Ian Gregory has also been cataloguing this problem for a few years now.

The temerity of Microsoft to simply never deliver these emails is shocking to me, but taken in concert with my original point that Microsoft software forms the very core of the spam problem to begin with, and the only conclusion I can make is that they are waging a war -- not against spammers, but against email.

Perhaps they're going to unleash some hidden features in Exchange 2008 that will ensure that email sent from one Exchange server to another is always passed through (and always reaches hotmail.com, msn.com, and live.com addresses), leaving everybody else out in the cold -- a Frankenstein thought if there ever was one.

Their motive may be unclear, but their actions are transparent -- they are complicit in the generation and distribution of spam, and are summarily deleting emails addressed to their users under the guise of fighting spam.

Until they remedy this egregious activity, I've instructed my mailservers to discard any inbound email from hotmail.com, msn.com, or live.com.

In a few days, I probably won't be able to reply to them anyway.

Posted by Paul Venezia on March 17, 2008 03:56 PM



March 12, 2008 | Comments: (0)

The Air, a month later

It's been just over a month since I first unboxed my MacBook Air. I wrote a review for InfoWorld that garnered some attention, and a sidebar that far too many people seemed to think was the actual review -- a statement on their own preconceived notions and lack of reading comprehension more than anything else, perhaps.

In any event, I've subjected my MacBook Air to daily use, dropped it once, had it sat upon by a careless individual not once, but twice, and have travelled with it via plane, train, and automobile. I've used it for email, Web browsing, and Linux, Windows, and FreeBSD server administration. I've written thousands of lines of code and thousands of words on it. I've used it on a plane, on a desk, in a chair -- and I still dig my Air.

I've used it on WiFi hotspots, with 802.11b, g, and n networks. I've used it with my Nokia N95 acting as a Bluetooth modem. I've plugged into a wired Ethernet network using the USB adapter. I've done photo editing and some audio processing with the Air, watched movies and listened to music. I've used it with a USB serial adaptor to configure Cisco switches. I've done everything that I normally do on any computer, laptop or not, except use CDs or DVDs -- I haven't needed that function even once. I only used the Remote CD function to install XCode from the Leopard CD the first day. I used Apple's Migration Assistant to move over all my settings, email, and applications from my MacBook Pro (running Tiger at the time) and haven't had any issues with those apps either, except for having to reinstall Microsoft Office.

As with any piece of technology, your mileage may vary, but the miles I've put on my MacBook Air have been straight and true so far. I've only rebooted it once in that month, after installing some drivers, yet I use it every day. That's the key to usability for me. I loathe waiting for laptops or workstations to boot or dealing with OS issues. I have work to do. Open it up, log in, and launch another xterm, all within five seconds.

To be honest, I've grown somewhat disillusioned with the attention it receives in public settings. I can't take it to a coffee shop without at least two or three people interrupting me to talk about it. But if that's the biggest problem I have with the Air, I'm in good shape.

Posted by Paul Venezia on March 12, 2008 03:33 PM



March 08, 2008 | Comments: (0)

/etc/hosts.deny, hackers, and automation run amok

3AM. It's always 3AM when these things happen.

Last night, my cellphone started beeping, and after it finally woke me up, I cracked open an eye and checked the screen. Text messages from Nagios, telling me that my main FreeBSD mail/Web server was incommunicado. Lovely.

I crawled out of bed and logged into my MacBook Pro. I had an open SSH session to that box, but it was all but unusable, echoing back a character every few seconds. An eventual 'uptime' showed the 5-minute load at over 300. Three hundred processes in the run queue basically means the box is thrashing wildly... but why?

The Nagios client had respawned a hundred or so times, sshd, snmpd, and inetd were all running 60-70% CPU utilization, completely consuming both CPUs. Everything had come to a standstill. I killed the offending processes from the console (hooray for Raritan KVM-over-IP!) and the box settled back down.

I first started sshd back up, and didn't see the load rise, but as soon as I attempted to SSH back into the box, it spiked to 100% utilization. I killed it, and rebuilt openssh-portable from ports, wondering if I'd been hacked, or the sshd binary had somehow become corrupt. I ran the newly-built sshd manually in debug mode, and watched the same problems occur. Obviously, this wasn't good. Checks of dmesg and /var/log/messages showed literally no problems whatsoever. The I/O subsystem seemed fine, as did all normal server operations -- I could SSH out, Apache, MySQL and sendmail were working, but there was obviously something very wrong.

The uptime on this server was 525 days. Generally speaking, I refrain from rebooting a box unless absolutely necessary, but in this case, I felt that I had to start with a clean slate. For the first time since September of 2006, I rebooted my main workhorse server.

It came back up without issue, other than the same sshd, snmpd, and inetd problems. The reboot was ultimately unnecessary. But what could be causing this problem? As I was making a cup of coffee, I thought that I might try removing hosts.deny to see if that made a difference. That did the trick -- all was well without it. But what caused that?

Awhile ago, I wrote a quick script to scan /var/log/auth.log for spurious brute-force SSH login attempts, and to add the offending IP address to /etc/hosts.deny for sshd. This worked extremely well, reducing the potential effectiveness of these attacks to all but zero. The problem, as it turned out, was that the script eventually wrote over 140 IPs to /etc/hosts.deny, which either triggered a bug, or exceeded a line-length limit that I'm unaware of. Removing that line caused all previously-misbehaving services to return to normal, and after some time to settle down, the server was back to handling a few hundred thousand emails a day, alongside Web and DNS services. I rewrote the brute-force detection script to add IPs to a pf table instead of /etc/hosts.deny, and parsed the previous hosts.deny list into the table to retain that information. Of course, this is how I should have done it to begin with. It took two cups of coffee, but I was out of the woods.

This was a decidedly non-obvious solution to a decidedly bizarre problem. I'd still like to know if I hit a bug in the BSD stack, or what the hosts.deny line-lengths limits are. Anyone? Bueller?

Posted by Paul Venezia on March 8, 2008 02:14 PM



February 15, 2008 | Comments: (0)

Finally, Leopard

Summary: Make sure you uninstall SideTrack 1.5 before doing a Tiger-to-Leopard upgrade.

Tolstoy:
I'm not the kind of guy that leaps on new operating systems before the shrink wrap has shrunk. I like to let others take the lumps of a .0 release before I subject my core laptops and workstations to the latest and greatest. Thus, I kept my 17" MacBook Pro on Tiger until this evening.

I probably would have stayed there for awhile longer if I hadn't picked up a MacBook Air. I've been using it daily since I got it, switching back to the 17" when I needed the screen space (heavy coding, lots of RDP connections, etc), and I found that several of the features in Leopard were too good to pass up, especially spaces and the spring-loaded dock folders. So I mounted an NFS share to my trusty Adaptec Snap 650 filer, and backed up all 140GB of data from my MacBook Pro. Feeling relatively safe, I dropped the Leopard DVD into the drive (making that the second time I've used the DVD drive in at least six months, maybe longer), and let the installer do its thing.

The system updated successfully and rebooted. Happy that things looked like they had gone well, I started to log in -- but had no keyboard. The trackpad worked fine, but the keyboard was as dead as a doornail. No capslock lights, nothing. Obviously, this was a big problem.

I grabbed the Air and checked some of the sites I'd seen a month or so ago discussing intermittent keyboard problems with MacBooks and MacBook Pros on Leopard. Apple had released a fix for 10.5.1, and I had nothing better to try, so I plugged in an external USB keyboard, logged in, and fired up Software Update. The 10.5.2 update came down along with a passel of other updates. The subsequent reboot... did nothing to fix the problem.

Of course, the Apple update (MacBook/MacBook Pro Software Update 1.1) was rolled into 10.5.2, so it wasn't that... and this is my main machine. My workstations are great when I need the dual 24" LCDs, but when I need to find a comfy chair and get some serious writing or coding done, I'll grab the 17" and never look back -- except that without a keyboard, it's obviously useless. I was worried.

Then I remembered that I'd installed SideTrack. SideTrack is a trackpad helper app from Raging Menace. I've used it for years to enable single-finger scrolling on Mac trackpads, along with a few other nice additions not provided with the standard Apple trackpad driver. There's currently a Leopard update for SideTrack, version 1.6, but I hadn't checked that before the upgrade. So, I did the uninstallation and the necessary reboot, and voila, all is now well.

So if you're using SideTrack 1.5 on an Tiger system, save yourself a headache and uninstall it before you do the upgrade. Now my only decision is whether or not to reinstall the Leopard-compatible version of SideTrack. I doubt it supports the newfangled touchpad on the Air, and switching scrolling reflexes from laptop to laptop will drive me nuts. Decisions, decisions.

Posted by Paul Venezia on February 15, 2008 08:52 PM



February 12, 2008 | Comments: (0)

Let the games begin

It seems that Nick Farrell over at The Inquirer isn't so thrilled by my MacBook Air review. Actually, he doesn't really mention the review, opting instead to summarize the sidebar with additional commentary. To clarify a few of his points:

o- Yep, it took five hours to do the whole migration. The first 30 minutes were problematic, but the rest of the time was the two systems transferring 50GB of files via 100Mbit Ethernet without supervision.
o- The Air didn't crash -- the Migration Assistant application crashed.
o- I bought the Air myself.
o- "Fanboy" seems to be a favorite expression of someone who doesn't like to see positive comments about something they don't like. I gave the Air a "Very Good" rating, and it earned it. If it had integrated 3G and a realistic 5 hours of battery life, it might have made it to "Excellent".
o- Isn't it odd that although I'm apparently a "hack" trying to put positive spin on Apple's products, I decided to write an entire sidebar about a negative experience?

I suggest that Nick read the whole review as well as my blog comments. I'd be delighted to see him run that though his fun-house mirror.

UPDATE: Interesting. All the comments on the Inquirer post just disappeared right after I submitted one.
UPDATE: They're back, sans my comment. Curious.

Yet another UPDATE: I might suggest that anyone interested in this topic read the actual review, and my companion blog post, not just the sidebar. I wouldn't want anyone to be embarrassingly misinformed -- it's bad for the knees.

Posted by Paul Venezia on February 12, 2008 11:22 AM



February 07, 2008 | Comments: (0)

Dear Apple, Never mind, someone already did

There are times that I need to get my hands dirty fixing problems all by myself, then telling everyone else how to do it. Then, there are times that I post a blog entry complaining about something or other, and the solution finds me. This, thankfully, is one of those times.

While I'm definitely an OS X guy for the moment (don't screw it up, Apple... please), I was relatively upset about the essentially useless X11 code in Leopard, and posted about it this morning. This afternoon, I received an email from Tevor Zylstra with a pointer to the Leopard and X11 blog, which contains links to the Xquartz Project. There, I found an unofficial X11 2.1.3 release that fixes all the bugs I've come across with Leopard's X11, from the window focus to the Option-click pasting. Huzzah!

Either I need to do more research or less -- I'm not sure which. All I know right now is that I'm a happy guy. Many, many thanks to Ben Byer and his group. Hopefully, 10.5.1 will incorporate this update and make it official.

Posted by Paul Venezia on February 7, 2008 11:07 PM



February 07, 2008 | Comments: (0)

Dear Apple, please fix X11. Again.

I'll admit -- I've been reticent to migrate my MacBook Pro to Leopard. The Tiger installation on that system has been stable and reliable, and I didn't want to visit any unnecessary demons on it if I could avoid it. However, I just received my MacBook Air that came with Leopard. This has given me a few more reasons not to upgrade, unfortunately. The biggest issue is X11.

Tiger's version of X11 was cranky right out of the box, with X11 windows not gaining focus when Command-Tabbing to it, and middle clicks to paste the clipboard buffer don't work, and a few other inconsistencies. After a few months, an update was released that fixed these problems. It seems that the Leopard version of X11 again suffers from a nearly identical set of problems. The version included with Leopard is v2.0, where the version running on my Tiger MacBook Pro is v1.1.3 -- and it seems that I may need to downgrade to Tiger's version to resolve these issues, though obviously, I'd rather not do that.

The vast majority of Leopard users will never know about, nor need X11, but there's a large contingent of deep geeks that use Mac OS X and X11 constantly -- and this is the audience that greatly influences purchasing decisions made by others. It's certainly in Apple's best interests to fix these problems, especially since we've been down this road before. So please, Apple, please fix Leopard's X11. Pretty please? With sugar on top?

Posted by Paul Venezia on February 7, 2008 08:59 AM



December 04, 2007 | Comments: (0)

Wait... what?

So this one had me pulling my hair out.

My main Windows XP box is a VM running under VMware Workstation 6. It began life running under VMware Workstation 5, and I never bothered to upgrade the VM to Workstation 6 compatibility until I needed USB2 support to handle a Zune that Microsoft so nicely sent me for Christmas. The Zune install was unfortunately problematic, involving a few bluescreens, but was eventually successful following a few reboots, uninstalling and reinstalling. All was well until I fired up the VMware Virtual Infrastructure Client 2.0 to run some tests in the lab. The app wouldn't display all objects nor any VM labels, and the layout was all over the place -- essentially unusable. Uninstalling and reinstalling resulted in no change, so I assumed that the Zune installation had borked the .NET installation that VIC relies on. Uninstalling and reinstalling .NET 1.1, 2.0, and 3.0 had no positive effect either. This was a major problem, since of course I didn't take a VM snapshot before installing the Zune software.

I had a slightly older version of that same VM on another system, so I moved that over to my workstation and tried again. Booting that VM had me back into a stable version of VIC, so I upgraded that system to VMware Workstation 6 compatibility and rebooted it... only to find that VIC was again broken. I reverted back to Workstation 5 compatibility and VIC started working properly again. After hours of frustration and ill thoughts towards the Zune, it seems that this was actually the problem(?! ... *&#^$*%&^!%@!).

This is completely reproducible, at least on these VMs, and really has me wondering... why would changing the hardware compatibility level for the VM break VIC 2.0? I don't know, but it certainly does.

Posted by Paul Venezia on December 4, 2007 10:07 AM



November 12, 2007 | Comments: (0)

Stoked With Stoakley

I'll come right out and say this: I'm an AMD kinda guy. My main workstation runs Opteron 2220s, I prefer Opterons in my servers, and I've been looking forward to Barcelona for, well, far too long.

My attraction to the Opteron has been solely based on performance. In the past few years, the Opteron has consistently outperformed Intel's offerings, at least with my workloads. It seems that this may no longer be the case. My Stoakley reference system running two 45nm quad-core 3.0Ghz CPUs simply screams. I've had it running VMware ESX 3.0.2 for the past month, running a LAMP-based Web app load simulator I wrote in PHP. Even though on the face of it, Intel's design seems to be inferior and placing far too much emphasis on shared busses, the reality of the Stoakley platform is that it delivers.

Today's announcement from Intel regarding the new 45nm chips marks a reaffirmation of the next generation of processors. In reflecting on the fact that the new 45nm chips have transistor densities that place 30 million transistors on a space the size of a pinhead, I realize that the first Intel processors had around 2,300 transistors total. That was only thirty years ago, or so.

But enough reminiscing. Penryn is here, and it's going to make a splash if the performance of my reference system is any indication. The 6MB cache per die is definitely helping the numbers here, along with the 1.6Ghz FSB. If you want an exhaustive account of the ins-and-outs of the Stoakley platform and the new 45nm chips, check out Scott Wasson's Tech Report entry from September. I was at Intel in Oregon, standing a few feet from him when he took that beautiful photo of that wafer with a video camera of all things. I think I was laughing at the sight of three geeks desperately trying to photograph a largely reflective surface with digital cameras, and even took a photo of them trying to take photos.

On the outside of the box, I'm more concerned with what Stoakley can do for me. My Web app test hasn't changed much since I used it to test blade servers and VMware VI3 late last year. It's a relatively simple PHP script and a 500,000-record MySQL database. When a Web load generator is pointed at the script, it either serves up a static page, or makes a database call to generate a dynamic page. Tuning the parameters of the script can lean more towards Web server or SQL server performance, but normal operation has the predominate load shifting between the two as the test runs.

So I built a VMware ESX 3.0.2 server on the Stoakley box. The box itself had 16GB RAM, two 136GB U320 SCSI drives, and two Intel gigabit NICs. The onboard NICs weren't supported by VMware until just a week ago or so, and the SuperMicro case was low-profile, so I was limited to the dual-port LP Intel NIC I had. I dedicated one NIC to VM storage, and the other as the front end for all servers. I then built out six CentOS 5-based VMs: A two-CPU MySQL server, four single-CPU Apache servers, and a single-CPU load-balancer. Using LVS for load balancing, I nailed the box with HTTP requests for the Web test app installed on all four Apache servers. The load on all boxes grew substantially, delivering up to 4,000 requests per second, a number that's very dependent on the parameters of the test script, but certainly showed that the box was running on all eight cores, and running them hard. Over 200 million requests and nearly 15 hours later, the box showed no signs of distress, and each VM was still running like mad.

I kept the tests running over the next week, simulating a scenario that should never happen in real life, but occasionally does. After finally stopping the attack, I let the system quiesce, and then proceeded to load it up with more VMs for other tests I needed to do in the lab. I've wanted to run other benchmarks on the server, like compression and encoding tests, but that means that I have to power it down. I won't be able to realistically do that for another few days -- at this point, it's already indispensable.

So for now, I'm lacking raw numbers on standard tests, but I can tell you that my love affair with AMD is on shaky ground. I don't yet have a Barcelona system to run against Stoakley, and I really need to run those tests before I make any hasty moves. From what I've seen of Stoakley and Penryn this past month, my AMD honeymoon may be over.

Posted by Paul Venezia on November 12, 2007 07:12 PM



November 07, 2007 | Comments: (0)

Grinding my gears

I've had an annoying day. It's one thing when the technology refuses to cooperate, it's another when it seems that human incompetence plays the key role. My woes today revolved around the fact that Ubuntu Server 7.10 seems to be terribly broken on the initial install. I was actually installing Ubuntu Server on a new SPARC system, and was amazed that after the first reboot, the initial account created during the installation did not have sudo permissions, and the root account is locked out. Essentially, the installation was useless without rebooting via rescue mode and manually modifying /etc/sudoers. Of course, the rescue boot hung, and at that point, I just turned the server off and grabbed some dinner. This isn't the first time I've crossed swords with Ubuntu and came away feeling that it just isn't worth the effort.

Otherwise, I've been getting riled up with net neutrality issues, such as Verizon's recent experiments on breaking DNS. Earthlink is also playing this game, apparently. I fail to understand the logic behind these attempts to hijack their own users and subvert a core Internet service. It's small potatoes compared to the other shenanigans that major ISPs play, perhaps, but didn't everyone learn a lesson from Verisign's attempted coup a few years ago?

And coming in third, perfectly framing my disaffection with human incompetence is NaviSite. 165,000 to 200,000 sites offline for days following a failed datacenter migration? How is it possible that a large, publicly-traded company can fail so miserably at a fairly straightforward task? I cannot fathom undertaking such an effort without proper planning and the necessary expertise, but then, I'm kinda fond of not causing epic disasters. Not only have they taken all these customers off the Internet for days and days, but they're apparently also berating them on the phone and no longer participating in conference bridges they themselves set up. It's gone from a tragedy to a farce and back again. Cynthia Brumfield is one of those customers, and after five days is heading to Andover MA to get her data back. She's also bringing a video camera to document her experience.

If it happens, that should be interesting. What might be more interesting is NaviSite's declining stock price, and what I can only assume are some cold feet on the part of Sapotek, who just last week announced a partnership with NaviSite to deliver SaaS via Sun's Startup Essentials program. If I were Sapotek -- or even Sun -- I wouldn't want to be anywhere near this trainwreck. They want to deliver SaaS to thousands of customers, anchored by a company that can't even get a mature business like Web hosting right?

Brave.

PS: Check out NaviSite's page discussing the outage. Doesn't it look like the guy in the image at the top is leaning over to throttle the other guy in the foreground? Maybe it's one of their customers.

Posted by Paul Venezia on November 7, 2007 07:22 PM



November 07, 2007 | Comments: (0)

Catharsis via hypervisor

So for all the virtual infrastructures that I've built in the lab and in the field, for every time I've PXE booted a new VMware ESX server into a production environment, I hadn't virtualized my own core systems... until yesterday.

It was a spur of the moment kinda thing, and I went with it. I started the day with five physical core servers of varying age, and ended the day with two, having collapsed the others onto a single Dell PowerEdge running VMware ESX 3.0.2. These were a few Windows boxes, three Linux boxes, and a FreeBSD system, all running on relatively ancient hardware.

It might be odd to think that just feet away from an Intel reference system running the new Stoakley platform and several 8- and 16-core servers from HP and Sun, a 6-year-old HP Kayak XU800 sat, a Fedora Core 3 system with a Pentium III-866 and two 40GB PATA drives in software RAID1, running Cyrus IMAPD for my 5GB mailbox, as well as primary DNS, DHCP, and NTP tasks for the entire lab on all VLANs. Around the corner from that were several other servers running various backup tasks, public Web apps, and so forth. I decided that everything had to go, and by 5pm, I'd rebuilt all the systems on the VMware box, including a somewhat annoying Berkeley DB 4.2->4.3 migration for Cyrus, since the new server was built on CentOS 5. Essentially, this was a 5-hour server consolidation project from conception to reality, and I don't seem to be the worse for it. In fact, I've lightened the power and heat loads in the lab, and am making far better use of the Dell PowerEdge 2800 that's holding down these new systems.

The PE2800 isn't the highest-spec box, especially for VM tasks. It has two HyperThreaded single-core 3.6 Ghz Xeons with 4GB RAM and a bunch of disk in a RAID 5, but it's now running 6 VMs like a champ, including my Asterisk PBX build. I plan on moving over a few more boxes today, but the bulk of the work is done.

One of the only issues that I have with the new environment is that all the VMware management tools are Windows-based. I really wish that VMware had continued their practice of producing Windows and Linux management tools, since my only Windows XP system is a VM running on my workstation and I now have a slightly disconcerting dependency there. Add to that the fact that the VirtualCenter server is running on a VM of its' own on another physical system under VMware Server, and I probably should build a standalone Windows server to handle those tasks. It would be ideal if VMware could bring simple VM management for ESX hosts into the VMware Server Console. I don't need all the bells and whistles there, but I do need to be able to powerup/powerdown servers and access their console. Since there's been consternation regarding the management split between VMware Server and ESX, this might actually happen, but I'm not holding my breath.

On the other side of the coin, migrating a VMware Server 1.0.3 VM to ESX 3.0.2 was very simple -- move the files into place on the ESX host, run vmkfstools on the vmdk, and import the VM via VirtualCenter. I found that it's best to delete the NIC from the VM and re-add it since there's some issue with variable syntax in the .vmx file between VMware Server and ESX, but all told, it was quick and easy, just the way it should be.

Posted by Paul Venezia on November 7, 2007 10:42 AM



October 18, 2007 | Comments: (0)

Computing on the road details

So it occurs to me (and more than a few readers) that I neglected to give any details about the car in my last entry. Well, let's remedy that.

I bought a 2004 Audi A8L with 52,000 miles, which was relatively within my price range, and has the same layout and gear as the newer models. The downside is that the hardware revisions in this car are 3 years old, and apparently they made lots of changes in the interim. The other downside is the complete lack of warrantee, even though Audi recertified the car less than a year ago. Also, the nearest Audi dealer is around 50 miles away, and local shops won't touch this car for anything other than brakes or tires. In fact, the aforementioned problems with the garage door opener and light sensor were caused by a local tire shop letting the battery run down completely while putting on a new set of tires. After the jumpstart, those components stopped working.

So that's what led me to inspect the coding on the roof electronics module. This coding is standard base-8 additive, with specific values assigned to specific components. For instance, if the car has a solar sunroof, add 6, if it's a steel sunroof, add 2. If it has a garage door opener, add 256, and so on (the full list can be seen here). So I came up with a value for the roof electronics coding -- 4014. Inspection of this module with the VAG-COM software showed the coding was 4013, which is basically an impossible number. I changed that to 4014, and... nothing happened. I'm still feeling my way around the VAG-COM interface and capabilities, so I wasn't sure if an individual module reset was possible and I'd noted that even though I'd trickle charged the battery overnight, it was still showing only 50% capacity. Time for a reboot. I pulled the negative lead from the battery and replaced it a few minutes later. This powercycle fixed the battery level indication, and the roof electronics started working again.

When I was telling a non-geek friend about this, he was quiet for awhile, then asked how "a normal person" would have fixed this. The answer? I have no idea, but I'm willing to bet the dealer would have replaced the whole roof electronics module.

The other side of that coin was brought into sharp relief by Jon Udell, who emailed me this morning:

Here's hoping this query continues to produce zero results:

http://www.google.com/search?q=%22bricked+my+audi%22

Then again, it would confer a certain sort of celebrity...

Posted by Paul Venezia on October 18, 2007 11:16 AM



October 17, 2007 | Comments: (0)

Computing on the road

It's not exactly breaking news that computers are integral components in cars these days. They've been used since the 80's to handle various tasks such as fuel-oil ratios, performance tuning, transmission control, and so forth. What's far more interesting today is that they're in the cockpit, handling nearly every aspect of the car's direct control, navigation, and entertainment systems.

Much like embedded medical computing systems, the code running in newer cars goes through a more stringent QA regimen than, say, Microsoft Excel 2007, but they aren't perfect. The bothersome issue when problems occur is that many times, they "can't be fixed" by the dealer or private service centers.

It's one thing to bring a car to the dealer with a window that won't roll down due to a failed switch or motor. It's another thing entirely if the problem with the window is a bug in the car's computer. Given my recent experiences, it's all but impossible for even a certified dealer to handle problems of a firmware variety, rather than a physical issue. As car computing gets deeper, these problems will likely get worse, until some critical mass is reached and there are Linux geeks wearing coveralls and dragging laptops around in the garage.

To a geek like me, the design and control of these systems, especially in the high-end cars is quite interesting. Audi, for instance, has arguably the best in-car computing platform, called the MMI (MultiMedia Interface). It's present in the A6, A8, and Q7 series vehicles, and controls everything from the climate control to XM/Sirius, iPod, radio, video, navigation, and all associated preferences. The main control is a jogwheel with four softkeys, and a handful of function buttons. Like any embedded system, it has a bootloader, applications, several hardware components, and a facility to handle firmware updates. Curiously, this update path is based not on a proprietary connection from a dealer service computer, but can be done through the car's CD player. Essentially, it's possible to update the code on the car by loading an update CD-ROM into the CD player, and accessing a series of hidden menus to perform the update.

Unfortunately, it seems that many of these systems are relatively fragile. I've been told by dealers that not only are updates to the car's various control modules not covered by warrantees, it's not unlikely that some updates will irreparably harm some modules, requiring that they be replaced for hundreds or thousands of dollars. Essentially, you aren't covered for anything regarding the software in the car, and maybe not even some of the hardware if it fails during maintenance. This won't affect the car's nominal function, but the radio, navigation, climate control systems, and so forth may not function at all. This isn't unique to Audi either, as BMW has essentially the same policies. So in essence, although the mechanical/computer ratio on some of the newer cars is nearly 1:1, some of those computer parts aren't covered.

Another problem is with the techs themselves. I've heard time and again "Some of these cars are like your home computer. They break, and nobody knows how or why, or how to fix them." Rip and replace becomes the norm, rather than debugging.

So I'm on my own, armed with a VAG-COM USB CAN interface, a laptop, and my brain. So far, I've managed to dip into the car's computer to change a bevy of hidden settings, adjust the service interval timings, and reset the coding on the roof electronics module since it managed to forget that it had a light sensor and a garage door opener. The last part required rebooting the whole car by pulling the negative lead on the battery. Making changes to the car's computers in this way is highly dangerous in many cases. Setting the wrong value on a specific component can disable the component completely, or even cause the car to become inoperable. It's not quite the same as inadvertently dropping a washer into a cylinder, but the outcome can be similar.

On older cars, I've done radiator and manifold replacements, brakes, oil changes, and the like, but I'm on my quest to see what it's like as a modern-day shadetree mechanic. My tools aren't grease guns, ratchet sets and floorjacks, but rather firmware, CD-burners, and a laptop. My next tasks are to look into altering the transmission shifting timing and hopefully fix the route that's stuck in the navigation system. It's a whole new world.

Posted by Paul Venezia on October 17, 2007 03:01 PM



September 27, 2007 | Comments: (0)

VMware Workstation VMM issues resolved in 2.6.22?

For those unaware, VMware Workstation 6 has been exhibiting some significant stability issues under some 2.6.19 and 2.6.20 kernels. VMware Workstation would run and behave normally for some period of time, then crash with a VMM (VM Monitor) error, and take down the X11 subsystem, requiring a reboot of the whole system. I saw clear evidence of this under Fedora Core 6 x86_64 on these kernels, but not on the 2.6.18 kernel. There were several threads dedicated to this problem on VMTN, and a few possible fixes offered, but none worked. I resigned myself to running the VMware Server Console on my workstation, and housing my VMs on a different VMware Server system, which definitely has some benefits over running them locally, but I was bothered by this mysterious issue. My main workstation is a Sun Ultra 40 M2 with two dual-core Opteron 2220s and 8GB of RAM, so using that to drive VMware Workstation for my Windows needs is really the best option from a performance and functionality standpoint.

Since then, a FC6 kernel update to 2.6.22 was released, and rather than step back to 2.6.18 on that system, I tried going up to 2.6.22. That was about a week ago, and the VMware Workstation instance on my main workstation has been stable since. I haven't found any direct reason for the problem, or its resolution, but there you have it. Also, for those curious on getting the VMware Workstation components to compile and run on Fedora or other distributions, you'll likely need the VMware any-to-any patches found here.

Posted by Paul Venezia on September 27, 2007 10:42 AM



September 04, 2007 | Comments: (0)

Quality time with hold music

To make a very long story short, an Exchange server experienced catastrophic hardware failure, scrambled the mailstore, and was rendered completely inoperable. An hour on the phone with HP resulted in a promise that the warrantee replacement parts would arrive within 24-48 hours, no guarantee. However, an "uplift" service was available for the low, low price of $2,500 to expedite parts for next day. The whole server cost less than $2,500. Instead, I found a refurbished 1U server locally for around $1k with twice the resources of the original. We rebuilt the Exchange server, only to find the mailstore corruption. 11 hours later, the mailstore repair tools had finished and the mailstore was finally remounted... with errors.

Some (not all) users could access their mailbox, but were not receiving new mail. The errors in the event log elicited only two hits on Google, and none on Microsoft's site. The two hits were related to the same forum post with no resolution -- useless. Microsoft's own tools for locating resources pertaining to this error went nowhere. The actual error is from source MSExchangeIS Mailbox, ID 1025, error data "An error occurred on database "First Storage Group/Mailbox Store (MAIL). Function name or description of the problem: SLINK::EcUpdate Error: 0xfffffa84". Since I could find references to part of this error, but nothing about it in it's entirety, I had to call Microsoft. That's where the fun started.

After calling in and navigating half a dozen voice prompts, I spoke with a customer service rep who kindly took my credit card information for a $245 charge, gave me a case number, and told me that the tech queue wait time would be 5 minutes or so. I waited on hold for 62 minutes before I got a tech with a very heavy accent. He asked for the case number. I was surprised that he didn't have it already. How could he not have it? I was on the same call. I looked around and couldn't find the paper I'd jotted it down on over an hour earlier. I asked if he could look it up, and he brusquely transferred me back to the CSR pool. While he did this, I found the number. I checked it with the new CSR representative, and discovered that I'd been given the wrong number to begin with. Then, I was told that I'd be put back in the queue, with a hold time of only 45 minutes. I spoke with a supervisor who was very sorry that this was the case, but there was nothing he could do. Back I went into the land of Nod.

I was two hours and $245 into a service call to Microsoft to find more information on their own error code that isn't referenced anywhere on their site that I could find. I finally got a tech for the second time, who asked for the case ID, interrupted me twice while I was trying to give it to her, then abruptly hung up on me. Livid doesn't begin to describe my state of mind. I've worked with Microsoft enterprise products for a decade, and I've never had to call them before. That's truly a blessing.

What a racket. I'd love to get $245 to abuse my customers like this. I'd be a billionaire.

Posted by Paul Venezia on September 4, 2007 02:43 PM



August 10, 2007 | Comments: (0)

Wait. What?

Okay, maybe it's just me, but I swear that the default "new email" tone on Profimail is the Super Mario Bros. "new life" tone. Seriously. I'm not sure how I feel about that.

Posted by Paul Venezia on August 10, 2007 11:20 PM



July 25, 2007 | Comments: (0)

Intel's new heavy hitter

Intel had a fairly big announcement today, highlighting their work in the server MP space with a new multi-processor framework dubbed Caneland. This is a new one on Intel, and definitely new in the industry, marking the first time a four-socket quad-core offering has reached this level. The basis of the system is the Tigerton CPU, running four cores at up to 2.93Ghz per core. This is married to the Clarksboro chipset, and Intel is claiming a 2x speed improvement over existing Xeon-based chips. I had a chance to play with the Tigerton/Clarksboro framework in Intel's lab today, and it really is rather odd to see 16 cores on 4U, four-socket Windows box. I ran a few benchmarks, but I can't publish any of the data -- yet. Suffice it to say that the official launch of Caneland later this year will be quite interesting. Manufacturers have been getting shipments for over a month now, so Intel isn't worried about quantities.

I happened to be in Intel's Oregon location to attend a workshop centered around a some new products that will be announced in the coming months. It was a very enlightening few days, and left me truly wondering about AMD's delayed quad-core, Barcelona. It's clear to me that Intel's technology isn't quite as good as AMD's Opteron and Barcelona, but then again, they've had their version of a quad-core x86_64 CPU for quite some time, while AMD's still waiting on the official launch of their quad.

The differences in CPU design are significant. Where Intel basically bolts two dual-cores together to make a quad-core, AMD is placing all four cores on a single chip, in all its HyperTransport and NUMA glory. I've found the Operton to be the better choice for lots of workloads, especially RAM-intensive applications, and found Intel's new Xeons to be speedy, but challenged in key areas, such as bus performance and memory access. Those points are moot, however, if AMD delays Barcelona too much longer. I know I'm eager to set these new chips against each other, but it will be a bit of a wait on both fronts.

Whatever else is happening, it's certain that on every level, CPU development is full steam ahead... just in time for everyone to start spec'ing gear for their virtualization rollouts.

Posted by Paul Venezia on July 25, 2007 12:56 AM



June 08, 2007 | Comments: (0)

Making APC network cards play nice with Active Directory

Since I'm flush with APC gear at the moment, I've been working with the network management devices (obviously, since I wrote Cacti and Nagios plugins for most of them). One of the features of most of the hardware is the ability to use local or RADIUS authentication. Obviously, with a large number of devices, using centralized authentication is not just a good idea, it's the only way to fly. Unfortunately, none of the NMCs have straight-up LDAP auth, but they will do RADIUS. Thus, using Microsoft's IAS (Internet Authentication Service) -- their interpretation of a RADIUS server, it's possible to do auth to AD via RADIUS though it's not exactly straightfoward.

APC has a knowledgebase document that describes a RADIUS implementation using FreeRADIUS. I'm a big FreeRADIUS fan, having used it in countless UNIX and dialup scenarios. However, it won't authenticate to AD without some serious gyrations, and it's simpler to use IAS. In order to use IAS, however, custom attributes need to be defined, otherwise logins may work but the admin bit will not be set and administration of the devices will be impossible. Here's the remedy.

Install IAS on a Windows server and register it with AD. Then, define a client. The client should be configured with any friendly name, and the IP of the NMC card in the APC device. Set the vendor to RADIUS Standard, and define the shared secret you'll be using. Next, define a new Remote Access Policy. Add some conditions, such as Windows-Groups matches "DOMAIN\Domain Admins" AND Client-IP-Address matches "172.16.1.*", which would cause this policy to only grant access if the user account is in the Domain Admins group, and the client IP falls within the 172.16.1 network. It's a good idea to restrict access in this way, since IAS RADIUS policies stop on the first match -- if you have multiple policies, you want them to be specific to their task.

Once this is done, some custom attributes have to be defined to grant admin privs to the user logging into the APC gear. Click Edit Profile with in the policy properties page. Click the Advanced tab, and Add an attribute. Select Vendor-Specific, and then click Add. Click Enter Vendor Code, and enter 318 into the text field to the right. Then, click Yes, it conforms, and then click Configure Attribute. Set the Vendor Assigned Attribute Number to 1, Attribute Format to "Decimal", and the Value to 1. This specifies that the value of attribute number 1, with Vendor Code 318 should be a decimal (integer) and be set to 1. The other possible values are 2, which denotes a device login, or 3, which is a read-only login, and is the default if this attribute is not defined. Note that by defining another policy that might match on a different set of groups, and setting this value to "3" will result in those users getting read-only access to the same devices, which might be handy.

Click OK a few times to get back to the Profile settings dialog, and select Authentication. make sure CHAP and PAP are selected, then click Apply and OK. Make sure that "Grant remote access permission" is selected, and click OK again. At this point, the server should be configured properly for the one RADIUS client originally specified. I usually stop and restart the IAS server at this point, since I've seen it act oddly when this isn't done.

Now, log into the APC device and configure RADIUS authentication. This is usually found on the Administration tab, under Security/Remote Users. Note that this has been tested with APC app module v3.3.1, and aos 3.3.4. If you haven't updated to this revision of the firmware on your devices, it's well worth the time to do so before continuing. Prior revisions may not even support RADIUS auth, for instance.

Fill in the server IP address and shared secret, then test the authentication. Assuming it works, set the authentication selection to "RADIUS, then Local Authentication", log out and then log back in with an account that matches the Remote Access policy defined on the IAS server. You should be in like Flynn.

I've run into some issues with using Network PowerChute with RADIUS authentication -- namely, the Network PowerChute service will not properly authenticate to a UPS NMC card that's configured for RADIUS auth, even with valid RADIUS credentials. I'm working with APC on a solution to that issue now, but I've tested this configuration with APC managed rack PDUs and air units without issue.

Posted by Paul Venezia on June 8, 2007 12:33 PM



June 01, 2007 | Comments: (0)

Introducing ASAP - Automated Switchport Access Provisioning

...or something like that.

ASAP is a PHP/Perl application that automates switchport VLAN assignments for Cisco switches.

The good stuff:

o- Web and telnet interface
o- Can handle multiple switches across multiple sites
o- All switch interaction is via SNMP
o- Forces selected switchport down/up, causing Windows systems to automatically release/renew a DHCP lease
o- Prunes the ARP table following a VLAN modification
o- Can prevent certain subnets and IP addresses from being modified
o- Can be easily used by end-users
o- Admin interface provides instant system location by hostname, IP or MAC address
o- LDAP authentication for admin interface
o- MySQL logging
o- Basic reporting tools
o- Has been tested on CatOS and IOS with Cisco 6500-series and 3500-series switches

The bad stuff:

o- Rudimentary reporting needs work
o- Unsure of scalability. Sites with dozens of switches may require code tweaks
o- Hasn't been tested on several switch classes
o- Configuration could be more straightforward

Overview:

Once a network has been built and is fully operational, the vast majority of configuration tasks are simple VLAN assignments. Usually, these assignments happen only once, when a workstation is first introduced into the network, but in lab environments, VLAN assignments can occur constantly. ASAP was designed to remove the burden of system switchport location and VLAN modification from IT, and allow general users to easily perform these changes. Alternatively, ASAP can be configured to only allow admin access, and given a MAC address, IP address, or hostname, a specific system's current switchport can be located and modified without telnetting to a switch, and with an audit trail.

I originally wrote this right before I moved two large sites from one building to another. Each site had over 800 switchports and I was lazy enough to not want to deal with VLAN assignments. I wrote the ASAP Web and telnet applications, and placed every switchport into a VLAN with ACLs preventing access to any internal resources other than the Linux server running the apps, a dhcpd and a wildcard DNS server. Thus, whenever a client is plugged into an unknown switchport, they're given an IP in the "deadzone" range, and any Web site they try to visit brings up the ASAP app. They can then select the appropriate VLAN for their system, and 45 seconds later, they're fully up and running, without rebooting. *nix systems that don't run a GUI interface can also do their own VLAN assignments via the telnet application. Telnetting to the IP/hostname of the ASAP server brings up a CLI version of the Web application.

Both apps are autonomous, and can be configured independently. This permits greater modularity as well as security, since it's likely that the Web app will be used in a general corporate setting, where the telnet app will probably be used more in a lab setting. Both the Web and telnet apps log to a common MySQL database.

The apps work by determining the IP address of the connecting client, then polling each switch in turn until the IP, MAC address, switch, and switchport information is determined. Then, if the IP doesn't match a denysubnets definition, and all necessary info has been gathered, the user can select from a list of VLANs, and the app will change that specific port to the new VLAN, disable and re-enable the port (Web app only), and remove the original ARP entry from the router. With most browsers, the user will be sent to a "Please wait..." page that will refresh after 45 seconds showing that all is well. In the background, the switchport has been changed to the right VLAN, and the port disable/enable action forces Windows and Mac systems to release/renew their DHCP lease. This forces the system into the correct VLAN without requiring any user interaction or reboots. Note that the telnet app does not perform the disable/enable action though it could certainly be coded to do so.

To date, this code has been used by several hundred people to change several hundred switchports, but needs testing in lots of other settings. There are probably bugs that will be triggered by older IOS/CatOS revisions, among other things.

Configuration:

Configuration is relatively manual for now. Read through the asapd.pl and index.php files to configure the application. The most important bits are obviously the switch IP/SNMP settings, denysubnet definitions, and other site information. Note that you'll have to manually pull the VLAN index numbers from your switches. Info on how to do this is in the script comments. The included asap.sql file should be imported into a new database and general access granted to a username/pass pair matching that found in the db.inc file and the asapd.pl file via mysql -u root -p < ./asap.sql and an accompanying grant all privileges on asap.* to asap@localhost identified by 'passwd'. The help.html file can be modified to show whatever help info you wish on the main app page. login.php needs to be modified with the appropriate LDAP/AD configuration matching your site. It's currently built for non-anonymous binding to a normal Windows 2003 AD server.

So there's a bit of work to do to configure the app, but if you're at all familiar with Perl, PHP, and MySQL, it shouldn't take more than a few minutes.

Troubleshooting:

There are no debugging facilities to speak of. Since real men debug with print statements, that's what you'll find. Enjoy.

If there's enough interest in this tool, I'll put more time into tightening up the configuration and reporting, and work on any bugs that might get dug up. Either way, if you're using ASAP in your network, I'd love to hear about it. You can find the code linked below.

Download asap-0.0.1.tgz

Posted by Paul Venezia on June 1, 2007 02:44 PM



May 30, 2007 | Comments: (0)

3Ware's 9650SE and the Sun Ultra 40 M2

For the past few months, I've been running a Sun Ultra 40 M2 coupled with a 3Ware 9650SE SATA RAID controller. It would seem that this is a marriage made in heaven.

As I've remarked before, the Ultra 40 M2 is simply the most powerful workstation available from a mainstream vendor today. Armed with two AMD Opteron 2218 dual-core CPUs, up to 16GB RAM, eight hot-swap SAS or SATA drive bays, two PCI-X slots, built-in 7.1 sound, S/PDIF optical input and output, a dual-layer DVD burner, and (in my case) an nVidia Quadro 5500 graphics card, this system is the creme de la creme of the workstation world. The only downside is the relatively anemic nVidia SATA RAID controller built into the mainboard. The performance of this controller isn't terrible, but the Linux driver support simply isn't there. Enter the 3Ware 9650SE.

The 3Ware 9650SE-8LPML I have running in this system is a full-on 8-port SATA RAID controller with 256MB RAM and an optional battery-backup unit. There are two four-port SATA multilane connectors on the card, which can be used to marry the 9650SE to a multilane-capable disk array, or to individual SATA drives with multi-lane to discrete cabling. In the case of the Ultra 40 M2, however, multilane to SAS cabling is needed. Fortunately, the built-in nVidia controller uses multilane connectors to feed the disk backplanes within the Ultra 40 chassis, but the included cables aren't long enough to reach the 9650SE. Sun can supply cables of appropriate length to reach the card, however.

Once I had the right cables, it was simply a matter of cable routing to each backplane connector and then back into the 9650SE. The fan tray that sits to the left of the disk bays can get in the way here, but some creative cable management within the case made everything fit and look like it was meant to be there. I placed eight 250GB SATA drives in the disk cages, and powered the system on. The 9650SE posted, found all the drives, and all was well.

I configured the eight drives into a RAID50 set, giving me high throughput on 1.36TB of usable space while providing significant fault-tolerance. The configuration through the 3Ware BIOS tools is quick and easy. Unfortunately, installing and running Fedora Core 6 (or any reasonably recent distro) on the 3Ware 9650SE isn't as simple. The 9650SE and the more recent cards from 3Ware aren't supported in the included 3w-9xxx driver found in stock 2.6 kernels. Historically, 3Ware has been extremely good at providing support for Linux and FreeBSD, so I would think that this problem will be rectified shortly, but in the interim, there are a few steps involved in getting everything working right on Fedora and RedHat. The first is to download the right install disk from 3Ware. You can find the files for just about every major distro on their site. These are just zipfiles with driver sets. Format a floppy with mformat (mformat a:), and unzip the installdisk file to the floppy. Then, boot the system as you would for a normal installation. At the boot: line, enter linux dd and the installer will prompt for a boot disk. Select the floppy drive, and it should load the appropriate driver. Continue the installation normally. On the Ultra 40 M2, I had to use a USB floppy drive, which appears as /dev/sda.

Following the initial boot, the system needs to be updated. Be aware that updating the kernel may result in a non-bootable system since the new kernel will not have the right driver for the 3650SE. Fortunately, it's easy to remedy this problem. Run the yum update to pull in all the new packages, including a new kernel and kernel-devel package. Then, download the upstream driver for the 2.6.19+ kernels from 3Ware's download site. Extract the driver source into a new directory, such as /usr/local/src/3ware, (mkdir -p /usr/local/src/3w-9xxx; cd /usr/local/src/3w-9xxx; tar zxf /path/to/source.tgz; tar zxf ./3w-9xxx.tgz) move into the driver directory, and edit the Makefile to pull in the right kernel path. In my case, the SRC:= line at the top of the Makefile should be modified to SRC := /lib/modules/2.6.20-1.2948.fc6/source/. This will tell the compiler to build the driver with the source tree of the new kernel, not the running kernel. Then, simply run make, and you should have a brand-spankin'-new 3w-9xxx.ko module for the new kernel. Copy this module into /lib/modules/2.6.20-1.2948.fc6/updates and you should be all set.

Once this was done, I rsync'd 190GB to the fresh install (yes, my home directory is 190GB), and saw write throughput to the RAID50 set at around 100MB/s. Reads were slightly higher than that at 110MB/s. I've been beating up the 9650SE and the Ultra 40 M2 with my normal brand of workstation torture -- cyclic MD5 sums on multi-gigabit files, kernel recompilations, DVD ripping, MP3 encoding, and two virtual systems running under VMware Workstation 6, all while playing movies from NFS shares and running Beryl with all the widgets enabled. Between the stellar performance of the 9650SE and the calm and steady power of the Ultra 40 M2, all of these tasks were handled with aplomb. Suffice it to say, you'd be hard-pressed to equal or surpass the performance of this box with any computing hardware available today.

As far as longevity and survivability goes, the 9650SE has been running for a few months without a problem, and my several-year-old 9500 8-port SATA RAID controller has been driving a four-disk RAID5 set without a hiccup. If history is any indicator, reliability isn't an issue with 3Ware cards. I'll be posting more on this power duo as time and events warrant, but for right now, I'm a very happy guy.

Posted by Paul Venezia on May 30, 2007 10:17 AM



May 10, 2007 | Comments: (0)

More on broadband banditry

Yesterday, I posted about six things that need to change. One of them was entitled "Broadband Bandits", where I basically denounced broadband companies' artificially limited bandwidth options. After re-reading it, I think I need to clarify a few things.

Certainly, these companies aren't in this business for wholly altruistic purposes -- they're in it to make money. That's the whole idea. The problem that I have with most broadband offerings is that they're specifically designed to limit end-user options without any reasonable alternative. Most areas with broadband access have one or two options, and they're generally both playing this game.

One of the major issues is the ridiculously limited upstream bandwidth provided in most residential packages. For $50 a month, I would expect to get better than 39KB/s uploading images to Flickr, videos to YouTube, pictures to my eBay auctions, and when sending email attachments. Unfortunately this is rarely the case, since upstream bandwidth has been squeezed as low as possible.

Even RoadRunner, a company that does not generally limit users' bandwidth, nor block well-known ports, delivers 5Mb/384k service with their standard package. I tested a freshly-installed RoadRunner line the other day, and found that it's just barely possible to get 5Mb down, with the 384k upstream completely maxed out with TCP ACKs. Other companies do the same thing, offering a 15:1 up/down ratio service that can just barely reach those levels, hampered by the upstream caps.

The DOCSIS cable standard isn't synchronous. Current DOCSIS installations based on the 2.0 standard are capable of delivering 38Mbit/s downstream and 27Mb/s upstream to a group of modems. A small neighborhood would have this bandwidth split between any number of modems, and using the law of averages, most users will get their rated download speeds. But notice that the 2.0 standard's down/up ratio is roughly 5:3. This doesn't coincide with the 15:1 ratio found in most broadband plans. Some offerings in the US and Canada are nearly 20:1. This doesn't jive with the capabilities of DOCSIS, so there's no technical reason why these plans exist. Upstream data is subjected to higher noise levels across a cable plant, but that doesn't justify the low caps found nearly everywhere.

The new DOCSIS 3.0 standard is very new and hasn't been widely adopted yet, but was designed to give FIOS a run for its money, offering 160Mb/s downstream and 120Mbit/s upstream to the same number of modems. Again, we see a similar down/up ratio in play.

I've seen many commercials for broadband service showing a fellow sitting in his kitchen with a laptop, telling his wife he can't go to the mall because he has to finish some work. Suddenly, we see a screenshot showing a "Done" dialog box, and voila, due to the power of XYZ's broadband service, the lucky fellow can go to the mall and relax on the hard wooden benches outside Bed Bath and Beyond. The problem here is that the ad specifically targets those people that can telecommute, without mentioning that if he was uploading a PowerPoint presentation, he'd be sitting there for a long, long time... assuming that the provider hasn't blocked IPSec and he can actually connect to the corporate network in the first place.

Consumer broadband needs to change. It needs to provide at least a 5:3 down/up ratio as part of the standard package for a reasonable price. I know dozens of broadband users that would gladly trade a few Mb downstream for a few Mb upstream, and this trend is only going to grow. Fears of illicit filesharing and copyright infringement be damned -- you can't penalize a captive audience for something they might do.

Posted by Paul Venezia on May 10, 2007 11:42 AM



May 09, 2007 | Comments: (0)

Six things that need to change

Although I'm generally able to see both sides of an argument, there exists a short list of issues that I just can't comprehend. These are those issues.

1) The RIAA's war on its customers
This one has been going on so long as to almost be accepted. Of course, that's their plan. The vast amount of money being poured into lawyers, lobbyists, and scare tactics by the RIAA would have been more than enough to rework their long-deceased business model into something for the next generation. For an industry that was built upon pushing the envelope, they certainly can't seem to think outside the CD case. The heavy lobbying in Florida that has resulted in the used CD market there receiving stricter controls than the gun market is just one tiny example.

The RIAA is certainly under attack from every angle -- piracy, slowing CD sales, a massive increase in self-produced music, and flagging interest in marquee acts -- but nearly all of that is their own fault. Instead of embracing the new market, they've been trying to kill it by shipping CDs with rootkits masquerading as DRM schemes, producing lawsuits by the bushel, apparently destroying Internet radio, and projecting an overall public persona that falls somewhere between Al Capone and Stalin. It's just ludicrous.

But then, this is the industry best described in a misquote to Hunter S Thompson: "The music business is a cruel and shallow money trench, a long plastic hallway where thieves and pimps run free, and good men die like dogs. There's also a negative side." His original words were actually describing TV broadcasting, but the sentiment prevails.

2) Broadband Bandits (Update: More on this topic can be found here)
Comcast is the easy target on this one, but there are many perpetrators of this travesty. You know who you are. More importantly, your customers know who you are, and will jump ship in an instant if given the chance. With most of the competition buried in the backyard, and a weakened FCC sitting idly by, Comcast, Verizon, and many other providers are ramping up prices and dropping service levels. They're also applying voodoo AUP interpretations to cut off paying customers that go over some amorphous limit. Many of these companies come from a delivery-only background, where they deliver the signal, and the customer passively accepts it, such as cable TV. Back in the day, this was largely true of the Internet -- Web servers existed in datacenters, ISPs, and universities, and most content was text and the occasional picture. With Flickr, YouTube, MySpace, and the advent of simple videoconferencing, end users are much more apt to be sending nearly as much as they receive, yet most broadband connections are still ridiculously asynchronous. I just ordered Verizon DSL to provide a backup circuit. $30/mo for a 3Mb/768k circuit. This means that uploading a few 5 megapixel photos will take me roughly 3 minutes, and completely obliterate that 3Mb/s download rate due to upstream congestion, even though I'm not downloading anything.

There are a few reasons that most of these wildly unbalanced plans exist. Contracts with peering partners generally dictate up/down ratios to be maintained (eg, saving the ISP money). They also prevent customers from using videoconferencing and VoIP technologies to their full potential, resulting in poor performance. This forces the customer to only use approved methods of communication (eg, paying the carrier more per month). And lastly, they've always been that way, right?

As a sideline to all this nonsense, many carriers go so far as to block well known ports, such as Web, IPSec, and SMTP ports to residential lines. True, most people aren't running Web servers from their house, but lots of them are just trying to connect to the corporate VPN. To do that, you need a business-level contract for way more money per month and usually lower bandwidth. What a bargain.

Certainly, not all carriers act this way. Comcast and Verizon DSL are famous for it, but Time Warner's RoadRunner seems to be above this chicanery, at least so far. If AT&T wasn't dismantled nearly 25 years ago, we'd still be renting our phones from Ma Bell for $20 a month, and our telecommunications infrastructure would be the best the third-world had to offer. At least Verizon is offering FIOS in some areas, yet I know of entire communities that have no broadband whatsoever. Wasn't there a Universal Service initiative started over a decade ago? Note as you read that page, you see "The Federal-State Joint Board on Universal Service recommended that the Federal Communications Commission take immediate action to rein in explosive growth in high-cost universal service support disbursements. The Joint Board is also seeking comment on proposals for long-term, comprehensive reform of the high-cost program. 5/1/07." This is because we've gotten nothing for a whole lot of something.

Just ask a South Korean how much they spend on the 100Mbit Internet circuit in their house. CNet was talking about 20Mbit links, universal video-on-demand on the cheap back in 2004. Not much has changed in three years, except their average bandwidth has increased fivefold. Heck, just ask them about the Internet service to their mobile phones -- it beats anything in the US by far. This brings me to number three.

3) The US is a mobile communications wasteland
Crazy, indecipherable "plans", "anytime minutes", $0.10 per text message, $0.003 per KB (read that any way you want), and current phones that were cutting edge in Europe when John Paul II was still wandering around the Vatican. That's the state of mobile connectivity in the US today. I've heard more than a few foreigners describe a trip to a T-Mobile store as "like visiting a cellphone museum". Given what they're used to in Europe and Asia, I have little doubt this is true. Wireless carriers in the US have been raking in money hand over fist for the past five years, riding the cellphone boom as high as it will go. During all this, they've been slowly doling out features to their users like cake to the starving, while the rest of the world runs circles around us.

The pending release of Apple's iPhone may spark something here, just as the iPod blew the portable MP3 player market apart. Hey, has Steve Jobs ever made a mistake?

4) Airport Wifi
This one's personal. I understand that fleecing business travelers for $10 or so during a flight delay is part of the business model, but even crack dealers give away the first few tastes. Can't we get 30 minutes free, and a reasonable hourly rate thereafter? I can't believe that any airport Wifi installation hasn't already paid for itself a hundred times over. I'll continue to hold Manchester Airport up as a shining example -- wide coverage, free service, no splash page. It's just beautiful.

5) Spam and the Windows Protection Racket
This one will never disappear, but it can be marginalized. If thousands and thousands of compromised Windows systems were to be patched, replaced, or burned in effigy, the volume of spam worldwide would be drastically reduced. Couple this to viruses, adware, malware, and so on, and there's very little that your PC can't do -- your taxes, spreadsheets, Web surfing, and spamming the bejeezus out of thousands of people. I think we may be near the top of a Bell curve on that one. Vista is more secure than XP (which isn't saying much) but the sheer numbers of wide-open Windows systems on the Internet will necessarily begin to decline due to hardware failure, if nothing else. If the replacements are tougher to compromise, then the spam levels will abate somewhat, as will other nefarious afflictions of the digital age, and we'll all be a little safer and saner.

Of course, if Windows were suddenly secure, it would directly affect the revenue of hundreds of smaller software vendors hawking Windows protection applications, but I can't feel too bad for Symantec or McAfee -- they'll survive.

6) Oops! I lost your ID. My bad.
Every week or so, we hear about the theft of another million identities from a laptop or network intrusion. Sometimes it's a corporation, sometimes a university, or sometimes the federal government. Sometimes it's your ID, sometimes it's mine. Pretty soon, it'll be nearly everyone that's ever had a credit card, applied for a loan, opened a bank account, or was simply assigned a social security number.

There are no formal penalties for this invasive personal intrusion, and some companies simply don't tell anyone that the event occurred. If a company doesn't have adequate security and lets a few hundred thousand database records flap in the wind, the victim will at best spend days straightening out a credit mess and changing all their accounts to new numbers. At worst, they'll lose money, their credit rating, and maybe even their job through no fault of their own. If a department store chains' physical security was so lax as to have their customers violently mugged en masse simply for being in one of their stores, you can bet they wouldn't be in business any more. What would be worse would be the poor people that got mugged because they were in a different store, but that store told the muggers they were there. Identity theft isn't much different -- since your ID is bought and sold to whomever, without your approval.

We need accountability for data security lapses of this magnitude, plain and simple. We only get one identity, and when it has been dragged through the mud it can take years to recover, and sometimes it's impossible. Unfortunately, it will take new laws and stiff penalties to see any change here, since it's apparently more cost effective to throw your customers under the bus (see number one, above).

It's obvious that the US is going through a period of massive change, largely related to the presence of the Internet and the forces that can exert some influence on it. Some of these issues may be just growing pains, but some of them may be cancer. Thus, it's very important that we not shortchange our technological future for short-term economic and bureaucratic issues. We've sold our society to the electron, and we'll be beholden to anyone who wields it better than we do.

Posted by Paul Venezia on May 9, 2007 02:58 PM



May 02, 2007 | Comments: (0)

Check it out: Deep into APC hardware management

I just barely finished turning up two new datacenters in two different states within two weeks. Exhausting? Definitely. On the plus side, however, I wrote several new tools and plugins to manage all of the APC gear that went into both sites with Nagios and Cacti.

First, a little background. Both datacenters were built to be nearly identical to each other -- from rack layout to equipment, to color-coded patch cabling. The major difference is that one site is cooled with APC ACSC100 In-row air units, and the other cooled with ACRC100 In-row water-cooling units. Both sites are powered from APC Symmetra PX UPSes and PDUs, and use APC racks and 3-phase zero-U rackmount PDUs. In addition, several NetBotz WallBotz 500 units were implemented to provide external environmental monitoring and surveillance of the rooms. Basically, it's all APC gear. I'll be posting more on the build process over the next few weeks, but I wanted to get some of the code out there first.

I wrote two main plugins for Nagios and Cacti to assist in monitoring all this new stuff. The Nagios plugin checks the most pertinent data on the ACRC and ACSC units, as well as the main sensors on the NetBotz units, and the load on each phase on the PDUs. It's come in very handy since the sites were turned up, since I have a easily-digested central view of all PDUs, or all AC units on one page. Tweaking parameters on the AC units becomes very simple when you have all the data in one place, versus having to log into each unit to get status info, or even using APC's Infrastruxure Central Console.

I've released the Nagios plugin, check_apcext, and will be posting the Cacti templates soon. Here's the overview of the Nagios plugin, and a link to the NagiosExchange page. Enjoy.


Usage: ./check_apcext.pl -H <hostip> -C <community> -p <parameter> -w <warnval> -c <critval>

Parameters:
APC NetBotz
nbmstemp NetBotz main sensor temp
nbmshum NetBotz main sensor humidity
nbmsairflow NetBotz main sensor airflow

APC Metered Rack PDU (3 phase)
rpduamps Amps on each phase

APC ACSC In-Row
acscstatus System status (on/standby)
acscload Cooling load
acscoutput Cooling output
acscsupair Supply air
acscairflow Air flow
acscracktemp Rack inlet temp
acsccondin Condenser input temp
acsccondout Condenser outlet temp

APC ACRC In-Row
acrcstatus System status (on/standby)
acrcload Cooling load
acrcoutput Cooling output
acrcairflow Air flow
acrcracktemp Rack inlet temp
acrcsupair Supply air
acrcretair Return air
acrcfanspeed Fan speed
acrcfluidflow Fluid flow
acrcflenttemp Fluid entering temp
acrcflrettemp Fluid return temp

Thus, in checkcommands.cfg, place the following:

define command{
command_name check_apcext
command_line $USER1$/check_apcext.pl -H $HOSTADDRESS$ -C $ARG1$ -p $ARG2$ -w $ARG3$ -c $ARG4$
}

and in services.cfg, you'll have something similar to the following:

define service{
use generic-service
hostgroup_name acsc
service_description ACSC Status
is_volatile 0
contact_groups admins
check_command check_apcext!public!acscstatus
}
define service{
use generic-service
hostgroup_name acsc
service_description ACSC Rack Temps
is_volatile 0
contact_groups admins
check_command check_apcext!public!acscracktemp!90!95
}

... and so on, for all parameters you wish to inspect. There are two special cases:

1) ACSC and ACRC status has no warn/critical values -- it's OK if the unit is operating, and WARNING if it's on standby
2) Rack PDUs will flag as WARNING or CRITICAL if any of the three phases is beyond the threshold.

TODO:
1) NetBotz external sensor monitoring
2) Other rack PDUs (although I don't have any to test)
3) Bugfixes?

Posted by Paul Venezia on May 2, 2007 11:29 AM



April 14, 2007 | Comments: (0)

More on Apple's Insecurities

A few days ago, I posited a few simple reasons that Apple's Mac OS X was inherently more secure than Windows. It appears that this touched off a firestorm, with a summarization of that post garnering over 3500 diggs, and trackbacks coming in from all over the globe. It was even summarized in Portuguese.

I've been reading a few of the thousands of comments on links to that post on various sites, and have seen more than a few folks take issue with my observations and Apple in general. These statements seem to fall into a few common themes:

  • Real hackers are in it for the money, not the glory. That's why there aren't any widespread OS X viruses.
  • Due to that fact, the installed base is too small to garner attention.
  • Apple users are generally young hipsters that use Macs due to the counter-culture marketing.
  • Apple hardware costs too much.
  • Mac OS X is a "toy" OS.
  • Microsoft actually did reinvent Windows with NT/2000.
  • No "real" admins use Macs

    There are more, but let's look at these six for now.

  • Real hackers are in it for the money, not the glory. That's why there aren't any widespread OS X viruses.
    There's definitely truth to this statement. Botnets are moneymakers, and all botnets are comprised of Windows systems. Writing code that would attempt to hijack Macs wouldn't be worth the time. But then, that's not the only way to make money from malicious code. Tons of spyware and malware are written simply to advertise to the user. Bonzi Buddy, et al, are just vehicles to land ads on the users' desktop, and there's plenty of money to be made there.

    Now, let's combine this claim with the "Apple users are hipsters" and "Apple hardware costs too much". If virus writers are in it for the money, and all that money comes from advertising in one form or another, then landing malware on OS X would deliver the perfect demographic to many advertisers. If you could guarantee that young hipster, counter-culture computer users with too much money would be seeing these ads, you'd have advertisers at your door with wheelbarrows full of hundred-dollar bills. Given that fact, it must not be worth the effort required to compromise OS X, at least for now.

    On the flipside to this argument, there are thousands of examples of malicious code targeting Windows systems that cannot be monetized. I'd love to know how anyone besides the anti-virus companies are making any money from the ANI vulnerabilities flying around.

  • The installed base is too small to garner attention.
    I started off that post remarking about the new "virus" for iPods running linux. Enough said.

  • Mac OS X is a "toy" OS.
    I never really understood this one. Can someone please enlighten me?

  • Microsoft actually did reinvent Windows with NT/2000.
    Indeed, the base of NT/2000/XP is light-years ahead of the Win95 base, and by officially killing off the older codebase, they've made huge strides in security. However, the code sharing between the two is deep in order to ensure backwards-compatibility. This is how we wound up with kernel-level printer drivers, no concept of privilege escalation, and arbitrary code execution vulnerabilities on Windows 2000/XP. This is mitigated somewhat in Vista with UAC, since it prompts for everything, but that's closing the barn doors well after the horse is gone. Enough people will disable this annoyance to render it mostly toothless.

    Don't be fooled -- wowexec will be with us for a long time, and with it, the ghosts of hackers past.

  • No "real" admins use Macs
    I've been seeing "real" admins flocking to OS X for the past few years, myself included. Over at NOTN, I posted about a recent skirmish I had with a corrupt bootflash on a redundant Cisco 6509 supervisor blade. Note the screenshot is of my MacBook Pro. I write tons of code on my MacBook, administer Windows, Linux, FreeBSD, and Solaris systems, do high-level network construction and configuration, and constantly run lab tests from this system. This week I engineered an datacenter relocation to a new building armed only with my MacBook Pro and a Dell D800 running Fedora Core 6. If that's not "real" geekery, I don't know what is.

    My reasons for using OS X have nothing to do with marketing. As soon as it stops meeting my needs, I will move on to something that does. My reasons are more substantial than "it's just so cool and refined": Instant-wake from sleep, a native POSIX OS, native X11, vim, perl, php, MySQL, Apache, high performance, minimal security worries, a plethora of OSS applications, all running seamlessly with Photoshop and Microsoft Office, all without a sizable performance penalty from anti-virus software. Why wouldn't I use it? My big workstations are Linux, my laptops and DAWs are OS X. It's a mix I find to be constantly available, reliable and powerful enough to handle what I can throw at it. Computers are tools, after all.

    I'll be getting into this debate more in the coming weeks, so stay tuned.

    Posted by Paul Venezia on April 14, 2007 01:35 PM



    April 09, 2007 | Comments: (0)

    The Myth of Apple's Insecurities

    In case you missed it, there's a virus for the iPod. Yep, that's right, your MP3 player is a veritable hotbed of virus activity -- but only if you're running the iPod Linux distribution, and only if you take great pains to make the virus function, since it doesn't really work. We can argue about whether or not this code actually constitutes a virus, but that's not the point I'm trying to make.

    The point here is that if it has a CPU, hackers will try to break it, and virus writers will try to write a virus for it. Given that there are probably only a few hundred -- maybe a thousand -- iPods running Linux out there, the fact that someone took the time to write this virus, or malicious code is an example of why Apple detractors clamoring that Macs aren't a target due to the lower market share are all wet. I ranted on MOAB two weeks ago, pointing out that most of their bugs were either local exploits or issues within third-party applications, and there has never been a virus in the wild for OS X, much like there's never been one for Linux. The difference isn't market share, it's the foundation of the operating systems. Given that most virus authors and hackers are in it for the ego, don't you think that there would be a huge incentive to be the first one to write a widespread OS X, Linux, or FreeBSD virus?

    If an OS is built on shaky ground, everything layered on top will suffer. This is the position that Microsoft is in now. Apple was in this very position at the end of the last century. They decided to start over, providing a clear upgrade path and supporting legacy applications on the new platform. OS X was developed from BSD and NeXT, built on a foundation that dates back twenty years o