Following that FUBAR with T-Mobile Sidekick and the data loss associated to the device, Microsoft recently issued a said "Beginning today, log into the My T-Mobile website, where there will be a recovery tool to restore contacts you may have lost during the recent service outage. This tool will enable you to view the contacts you had on your device as of October 1."
The press release went on to explain: "With a few clicks and a confirmation, you will be able to restore these contacts to your Sidekick. If you have recreated some of the same contacts on your Sidekick since October 1, you can choose to keep both sets of contacts, merge them, or just keep the set of contacts now on your device. You may also edit any partial or complete duplicates on your Sidekick after restoration."
The rumors are still flying about what caused SideKick users’ data to disappear. The repetitive rumor is that the SideKick data was 800TB. Supposedly SideKick was running on 20 or so CentOS Linux servers, and more than 8 Sun servers, both SPARC and X86-based, running Solaris. They also run an Oracle RAC [Real Application Clusters] system and NFS file servers. The back-end storage is not known, although a Sun SAN has been mentioned.
Another rumor was that Roz Ho, VP of Premium Mobile Experiences, decided to halt the last backup prior to completion. The idea was to save money (rumored lack of SAN space) and there was an upgrade to the backend hardware to be done by Hitachi Data Systems. The staff was said to have complained loudly about this and there are at least four places that claim to have emails from staff describing parts of the failures.
There is also a rumor that T-mobile had an SLA [Service Level Agreement] in place with Microsoft which had a penalty clause of $700k per day or about $10 million for two weeks.
Tuesday morning, Microsoft CEO Steve Ballmer characterized the recent Sidekick data loss episode as "not good". Ballmer was the keynote speaker at the SharePoint Conference in Las Vegas. Network World asked Ballmer how they were going to earn back the trust of SideKick users. He shifted the answer over to a marketing pitch for SharePoint, the company’s unified collaboration, content management, and enterprise search system. Saying in part "People will want to know, is our approach different for SharePoint Online, is our approach different for the enterprise infrastructure. I think we have good answers, but I know we are going to continue to upgrade our processes and have to upgrade how we talk about this stuff, because we are going to get more questions."
Ballmer was not answering the bigger questions about so-called "cloud" computing. When the average mid-tier server hardware is capable of such high transaction per second, and the cost is less than 25 percent of what a five year old comparable server was, why would a small to mid-sized business move their IT operations to the "cloud".
An example of who would ask that question is a school district with 40 to 100 school buses with their pickup and drop off schedule. If they had a SideKick failure, what would have happened to the kids, their parents, and the teaching staff’s related schedules?
We asked a friend who is an IT manager for a small city what he thought of the "cloud". He said that the cloud is an interesting concept that will require a great deal of cooperation and trust to pull off and continue to support. We asked if he would move their city billing services over to the "cloud". He said: "At this time I do not trust the cloud, and no one has been able to answer my questions about security, controls, communications, authorization, authentication, and a lot more."
When we look at the SideKick management model, we see a lot of places where mistakes can be made. Each layer is dependent on the staff below to not make a mistake or at least be able to quickly recover from errors.
The list of SideKick customers who trust everybody to not make a mistake is significant: T-mobile’s staff is trusting each layer below to not make a mistake. They are expecting Microsoft Corporate and the Microsoft Entertainment Division, along with Danger to be, without errors, the daily data management side of SideKick applications and hardware. Hitachi Data Systems was supposedly doing a data upgrade without errors. Next in the domino line was EMC who was supposedly maintaining the backup hardware for Danger and T-mobile. Last in this odd daisy chain is rumored to be the Verizon Data Center which was maintaining the data for another wireless carrier’s clients. A whole lot of people thought the other guys were in charge of data safe keeping.
Contributing to the mix is the additional legal baggage of SLA’s [service level agreements] and their contractual promises which have performance clauses and penalties for failure to perform. T-mobile will collect millions from Microsoft, and possibly others, for SideKick being offline nearly two weeks.
SideKick’s problem is not just the customer’s data, their hardware, or applications. The problem is managing people. Everybody in the daisy chain of responsibility thought the other guy was doing their job better than they were.
Microsoft clearly has a leadership problem. They tried at first to offload the responsibility for the failures to a weak excuse of basically ‘not my stuff’. But, the questions asked by users are much more sophisticated than they were ten years ago. However, this week’s answers from Microsoft were not that much different than when the US Justice Department took them to task over ten years ago.