> I think that's the whole point: it's a different way of approaching the
> problem, and given that many 'scientific discoveries' were essentially
> mistakes, why not? If we can dispense with your "Don't do it because"s
> (i.e., you can always say that something doesn't tell you where to look for
> me -- so keep looking! ... and well, what if it's not as expensive as you
> think? And on ...), then I don't see it as being a particularly _bad_
> course of action. You certainly can't stop "old school" science, but
> there's plenty of room for brute-force attacks on problems that haven't
> gotten much attention recently with this kind of thing.
Well, maybe he's saying "don't just use models/theory, use some brute force." Ok, I did that today. That's why I was mucking with the signal analysis toolbox. There's not much theory behind the spectra of the differences in sea surface temperature fronts between satellite data and a predictive ocean model, so I did it to see what I could see. If that's what he means, um, whatever.
But I think what he's saying is, with all this data, who needs theory.
> Steven Wright has a joke about 'I have a map of the world, where 1" = 1"'
> and that's essentially the Google Ahah! Moment: why subsample when you have
> essentially "all" the data?
Because you don't have that map and you don't have all the data. You don't even have "all" of it. Not essentially, not figuratively, not remotely. Ok, so maybe you have it in the stock market because it went straight to disk. Except for the pre-computer data that some poor bastard had to enter, but never mind that.
A lot of the data I work with, somebody has to go out and get cold, wet and seasick to get it. Either that, or requires a satellite and the attendant military-industrial-complex brouhaha. It's expensive to get, and you'd better have a little theory to figure out where to go for it and how, because you probably don't otherwise know too much about what you're looking at -- otherwise you wouldn't be bothering. There's even a field called targeted observation, figuring out what and where has the most "leverage" on downstream results so you can concentrate your resources there.
What's more, it's never enough. Satellites can get a lot of surface data and some atmospheric data, but you still need somebody to haul a big latex balloon out of the garage every six hours to get what the satellites can't get. Let's just say that the density of spatial coverage varies between North America and the South Pacific. It's worse for underneath the ocean. Remember a couple years ago when that submarine ran into an undersea mountain because the chart misplaced it by a mile or so? And that's just what *doesn't* vary with time. What does vary in time is the sort of thing that makes a hurricane intense or less so when it makes landfall. Nobody can afford to have floating instruments to cover the whole ocean all the time, and satellites can't see the right light through clouds and can't see below the surface anyway, so somebody gets to fly into the damn thing to drop the instruments. Which you need to tweak your model, which you need because you don't ever have all the data.
It's a muddy, corrosive world out there, and it doesn't parse regular expressions.
-- Andy