Wednesday, June 13, 2012

How do you save a datetime?

The indeterminacy of local time, given UTC time, is a tricky problem to solve if you have not given yourself enough information to work with. Use datetimeoffsets.

Somewhere along the line on the MVC3 project I'm currently working on, we decided that datetimes should be stored in UTC format. I hadn't seen that practice before, but the other guys thought it was a good idea - the idea being that an absolute time reference is better than a local one. It's not parochial, it's universal, and it's based on Atomic time which makes it cool. And when we want to display the date anywhere to the user, we would simply convert back to local time. Best practice, all good, keep going.


And then I never thought about that again until I had to convert back to local time and I realised I couldn't. Not given a simple datetime, albeit in UTC format, I couldn't. Because of course how could I know what time it had been locally at the time of saving the original event's datetime? You can work it out easily enough in somewhere like Brisbane, where I am, because we don't have daylight saving time. Same with China, India, and Japan. But it had been a fluke, the fact that the app will be mainly based somewhere where DST doesn't apply.

I guess I had thought to myself: "My datetime values are easily expressible in local time; their meaning is perfectly clear to me". But then I read this, on Bart Duncan's SQL Weblog:
Of course, you might be thinking, My datetime values are all local server time; their meaning is perfectly clear to me. Well, one day your company may expand and your little homegrown system might need to handle data from more than one region. Or you might need to import the data into a new system when your solution is thrown out for being too provincial :). Or the data in your local database might turn out to be needed in some central data warehouse that consolidates data from a variety of sources. You can save yourself and your successors some grief by using the more robust datetimeoffset data type from the start.

If in doubt, read a book: that's my advice

This seemingly innocuous problem led me to re-read "The Quark and the Jaguar" by Murray Gell-Mann, whose curious title comes from the fact that quarks are simple and jaguars complex. If you haven't heard of MG-M, he's a physicist, one of the co-discoverers of the quark. In fact he gave the particle its name (with inspiration from James Joyce).

In the book he talks about the difficulty of quantifying complexity. The Algorithmic Information Content (AIC) of a piece of information is an important concept, one that you're aware of at a deep level, if only implicitly, if you're a programmer. In terms of a 'message string' it's the shortest program that will produce that message. In my case, I guess it's the most efficient way of getting the local time to the user. Another way of measuring complexity is to measure the length of the description of the problem. Look at this StackOverflow answer to get a rough feel for how complex this problem can get.

['Daylight saving time and Timezone best practices' StackOverflow question]

But maybe you're thinking "What's the big deal, like? It all fits on one StackOverflow page. That's not much of a description, ergo it can't be that complex." Well, consider Einstein's equation for general relativity - not the E=mc2 one, but it's similar. From Quark:
"It is a remarkably powerful schema, which has compressed into a brief message the general properties of gravitation everywhere...the schema is remarkably short, its complexity low. Hence, Einstein's general-relativistic theory of gravitation is simple."
I doubt if most people would call it simple, but they'd be wrong. And I doubt if most people would imagine it's quite so complex to deal with dates in a scalable and reliable way. Also wrong.

Complexity isn't intrinsic

If complexity is measured by the length of a description of a system, then local time determination (from UTC alone) turned out to be twice as hard as we (in our team) originally thought, because you need two pieces of information rather than the single piece we were storing. By the way, I'm not considering here how much actual information is required to store a datetime, or a timezone, etc. at a bit level. Rather, I'm taking a more coarse grained view, where each piece of information - local time, timezone, UTC time - is treated equally as a 'piece'.

But one vital thing to realise about complexity is that it's not an intrinsic property of a system - it depends on who's describing it, how much they already know, what assumptions they're allowed to make, etc. In other words, complexity is context-dependent. What is the smallest amount of information you need to save the time of an event? Could you just save one piece of information - UTC time - and reconstruct the local time from that?

Of course, you could just save events as local time and resign yourself to the fact that the very rare cases that happen outside your preferred timezone will have a wrong time associated with any event they trigger. In that case you would not even be trying to solve this problem: maybe you have that luxury.

Or, if you think you're smart enough, save the event time as UTC, but compute what the local time must have been at that time, using some sort of look up service. But how can you do that without knowing which timezone the user was in in the first place? You can't. Which is the problem in the first place.

Everything is obvious, once you know the answer

Or, to paraphrase Duncan J. Watts, some solutions are straightforward, but only when you recognise there's a problem in the first place. That's the hard part. So this may all seem really obvious - it certainly feels like that, writing it out like this. Each step on its own is easy to understand, but the overall problem at hand is a really thorny one, and if you don't see the inherent difficulty, you haven't taken the time (adjusted for daylight saving, of course) to pay attention. It crept up on us unexpectedly, so my assumption is that it might just creep up on at least some other teams out there too.

Steps to create a datetimeoffset

So, avoid saving only local time because you simply cannot determine the absolute time, and vice versa. But in using UTC make sure to include the local offset. Or, the other way round, it doesn't matter. You need those two bits of information, and datetimeoffset gives them to you in one field. The good news if you use Entity Framework is that it couldn't be easier. Note the "+10:00". That's because I'm in Brisbane, which is 10 hours ahead of UTC.