When all you have is a hammer, you can never find a bloody nail

Array

For a while now I’ve been wanting to do some tutorials based around Shark and some other tools my group publishes. Not for work as such, but, obviously I have a bias and stock in the idea. The more I work on & with Shark the more I’m surprised by what it can do. But it’s not one of those apps that will, nor can, lead you through the steps and give you clear answers – if only it were that easy. (well, sometimes it is; it’s often worth paying good attention to those [ ! ]’s)

Anyway, from working with developers here and there with it, I can see that people really need some good tutorials; you know, something which takes a real program with a relatively common sort of problem and shows you exactly what sort of symptoms to look for, how to determine and pinpoint the issue, and resolve it.

So, I’ve been looking about for suitable open source projects. I use Shark just about every day, but not on source I can reveal to the world. I’ve actually been looking, on and off, since WWDC earlier this year. Yet I can’t find anything that’s a good example case. Heck, I can barely find anything that’s even vaguely applicable. I’ve been scouring through Sourceforge, generally googling, and looking over the apps I’ve helped developers with over the last few months. Nothing. [[ the saddest thing there is that of all the Mac-specific projects that claim to be at least in beta, on average about one in five or six actually even compile… sigh ]]

The sad thing is, there was one dev’s app I had seen that was a really fantastic System Trace example case. (that’s not the sad thing; bear with me…) It was intricate, tricky, yet demonstrated some very common problems – in particular some really subtle multi-threading issues that just about everyone hits. And we were able to boil it down to a really simple test app which was easy to understand.

But of course I don’t have the source and of course [presumably] couldn’t use it anyway. I don’t even know what the app was called, to be honest. So – here’s the sad bit, thanks for waiting – I sat down a few weeks back and thought, hey, I can rewrite those examples, reproduce the issues we saw, and do a cool demo on that.

I then thought, heck, I’ll do a full multi-threading intro, cover all the standard patterns and problems, etc. (yeah, I know, I’m a glutton) So I write The World’s Worst Multithreading Code Ever. Seriously, it was bad; I intentionally put in every possible kind of multithreading problem you could think of. There was serialisation, there were work quanta way too small, there were too many worker threads, there were imbalances in consumers : producers, no queuing, etc… I stacked everything against my poor machine.

And it worked, and as expected the performance was crap. Nice. System Trace showed up so much lock contention I didn’t know where to start. Sweet as.

So then I duped that and fixed one issue – I can’t remember which, something to do with lock usage. Something which obviously was going to alleviated some of the more serious issues; reduce contention a bit and reduce the serialisation.

And then I ran it.

You can see where this is going, can’t you?

Stop me if you heard this one before.

It’s really quite funny.

You know, I didn’t want to do this…

I wanted to be… a lumb-

Okay, stop right there, we’ll have no Monty Python quoted in this journal, thankyouverymuch.

*ahem*

So yeah, of course my “fixed” version ran slower. @#%!

So I’m like, hey, this could be a blessing in disguise – here’s a case where an obvious fix isn’t, and I can turn this around and use Shark to show why.

Except I can’t, because I can’t figure out WTF is going. Seriously, it’s weird as. I mean, I learnt very quickly that no one is actually “good” at multithreading; you can get the guy who wrote the BSD pthreads library and he’ll probably still stuff up his threading occasionally… but still, this case was completely counter-intuitive. I didn’t investigate it too deeply – I got sidetracked thinking of new ways to debug threading issues – but I started to get a feel for it.. it’s something about the lock acquisition latency between high contention and low contention; it looks like it’s much higher in the latter case.. I guess that’s not surprising in a way, but really, I would not have expected it to be so pronounced.

Anyway, that’s kinda neither here nor there. The point is, in the real world people use Shark all the time to very good ends, and solve all manner of performance and correctness problems. But when you actually sit down and try to find just one such problem, they all hide, like cockroaches from the light.

(did you see the subtle pun there? damn straight)

So I’m rethinking my approach a bit, and I’m now thinking it probably wouldn’t be a bad idea to start out with the standard “how to use Time Profiling” spiel… but it’s oh so boring… and really, as I said, people get time profiling and (partly) System Trace; they don’t need yet another anaemic intro.

Yes, since I’m clearly – and ironically! – not going anywhere with this, I’ll leave it that, my point doubly made.

Leave a Reply