Saturday, June 23, 2012 Crashing Solved

I explain how I fixed the crash below. I described the crash problem and I how I narrowed it down here.
While I didn't have access to another computer, what I did have was another phone, a trusty Nokia feature phone with another carrier (with a good voice plan). I popped the sim out and put it my Android. The Sony worked ok and the error message didn't pop up. The phone didn't have a data plan so I couldn't do anything on the Internet without costing an arm and a leg. To be sure, I set it on GSM mode to nip any temptation. Then I tried the Android sim on the Nokia and it couldn't find a network there too. So the culprit was the sim card or related to it.
By now, one part of me was saying to just go and get it replaced. Enjoy the day in park. I was already getting stares with two phones lying around me in pieces. But I knew I was this close to solving this.
I put the Android sim back into the Sony and booted it up. This time, there was no error message. I checked the Mobile Network list and it found the networks but listed only the 2G/GSM networks. It was enough to get past the McAfee SMS verification.
I now knew what was happening. The process was trying to connect to a 3G/WCDMA network. In order to do that, it must use APN information from the sim card. The APN list on the sim card was corrupted. It tried to use the corrupted values stored in the APN fields and crashed. Since this is a core Android process, a watchdog process saw it crash and just restarted it again, creating a loop.
I called the carrier's support line from the Nokia and they re-sent the APN list. This was a Command/Configuration Message via SMS / text message and didn't use the data connection. Once the correct APNs were added, the phone connected to the network fine. The other casualty were a few phonebook entries on the sim card for the Nokia. I discovered later that a couple of the most recent entries went missing which probably reinforces my suspicion that there is a bug in the routine that handles read and write to the sim card.

In retrospect, the problem would have been solved by just requesting the APN list. I can also request for the configuration through a phone menu system. If you are doing that, be sure to request for both phone and data or Internet configuration information. So what caused it? I am not sure but I think it had to do with the long draft text/sms message.
After answering a call, I put the phone in the pocket without turning on the screen lock. As I walked in the park, the phone moving around in my pocket probably created the long text message. This wasn't sent, so the phone saved it as draft. It must have saved it on the sim card and the program simply wrote it verbatim. Sms messages have a maximum number of characters but most phones allow you to enter long messages. The phone will split the message when it is sent out. Since the draft message was very long, the program writing it to the sim card probably didn't split it into smaller pieces (and why shouldn't it? It was still a draft right? :) ). It kept on writing beyond the space where a normal text message would occupy and into where the data for the APNs were stored. It was basically a buffer overflow.
Should have this been caught in software testing? Yes and no. Bounds testing exist and should have been in play. But splitting a long draft message has it's own problems. How do you reassemble it for further editing? If it is stored according to the standard text format, there would be no way to link one stored text message with another. At least not in a standard way. And that is the point. Sim cards are fairly primitive storage and rely on programs writing to it to comply to standards. So while long draft messages should be split, they can't.
The alternative is to impose the standard message length limit to draft messages. But that is 'insane' because we have allowed long messages to be sent (and split) because consumers are sending long messages. And no carrier wants to be known as the one that puts a limit on text messages.
The cost of a few phones crashing is apparently acceptable. A thread on a fix for Sprint phones with similar problems, although has a different fix, is likely facing the same problem, data corruption on the sim card. Other users mistakenly pinpoint the problem to the sim itself rather than the data on it. I don't blame them really.
I am reminded of another method I learned from some ago: an error is fixable when it can be replicated. If it can be replicated, replicate it somewhere else that is similar but not identical. If the problem goes away, the problem (and solution) lies in whatever was different between the two.

Please share this with others if it solves your problem. Everyone needs help some time or the other.

1 comment:

  1. What a great blog post! Thanks for sharing it on your site.
    Phone Solutions


Recently Popular