About K12 Education: Data Crunching

Monday, July 27, 2009

Data Crunching

When I was growing up, and until recently, there was talk about what it takes for a kid to start being able to read a newspaper. Newspaper reading being the standard for literacy: Mostly, enough knowledge of the written and spoken language, and also some general knowledge to identify cultural references. Today, children have available TV and Internet, with a barrage of information, pseudo-information, disinformation, drama and raw data. The bare ability to read is now a good start, but not enough. The necessary general knowledge has also changed. The full range of capabilities that will make everybody professional information hackers, is beyond the minimum requirements for a graduate, or at least beyond the expected consensus. My own opinion is that in the developed world, such abilities will be of considerable benefit for anyone, and the lack of them, a definite disadvantage.

It’s easy to come up with many examples - in any given week - where our common data sources require thorough filtering. Children of a very young age need the skills to manage all this data: The first step is critical thinking about the claims about data. To achieve that, we need to understand the claims, and the way they can - and can’t - be proven or disproven. With masses of data, this translates into a field with the frightening name Statistical Literacy - not to be confused with statistics as a mathematical expertise, with the associated Greek-letter formulas. The level of statistical understanding a layman needs, is basically statistical intuition: Understanding the advantage of large amounts of data, compared with singular anecdotes; understanding the critical importance of data sources; understanding the importance of who we ask a survey’s questions, how we ask, who answers, etc.

In a world steeped in data, there are two main characteristics of the data that we need to be able to determine quickly: "How relevant is it?" and "How reliable is it?" In that order. We need to identify the relevant results, and then we need to identify the reliable results. There are many possible strategies, and these strategies have to change as the sources vary and as the Internet search engines change. Different data-sifting strategies should be followed for searching different types of data. Once we identify some relevant results, we need to check their reliability. Again, many strategies.

For example, when looking for the weather, a possible strategy is to use Google as a general-purpose search engine. One could guess a good search phrase for a general search engine, such as weather Lapland, if we want to travel there, and need to know if we should bring our swimming suits. That's an easy one. Many of the results are relevant, and it's easy to determine that a result is concerned with the weather forecast for Lapland, rather than long-term weather patterns, for example. Now we need to check the reliability of the results. In the weather case, we could look at reliable-looking websites, such as the BBC, and believe it. Or we can look at several results, and make sure we get a near-consensus before we believe them. If the results are about 1-2 degrees from each other, we can average them; if a certain result is 10 degrees off, we can probably just ignore it - or the website in general.

Even this very easy data-handling exercise involved some non-trivial skills and knowledge: An intuitive understanding of how the search engine works, necessary to guess a good search phrase; general knowledge - familiarity with different sources, necessary to decide that certain websites are likely to contain reliable information; laymen’s scientific understanding that even relatively reliable sources should be verified against other sources; laymen’s statistics understanding of how to deal with outlying results - results that are very far from the average (maybe ignore), compared to how we deal with similar results (maybe average). In slightly more complex searches, such as for historical facts, the searcher needs layman's psychological understanding of semi-verbal communication: Reading between the lines, watching for choice of words that may indicate the writer may have an agenda to push, which would put the reliability in doubt.

The 21st century human needs to have many multidisciplinary skills just to qualify as a layman.

Now, if you think that I am exaggerating the complexity of looking for information, and that the example shows that searching for info is now trivial, please try and find out for how long a stork can stand on one leg. Really, my daughter wants to know that.

3 comments:

Amir NoteaJuly 30, 2009 at 7:19 PM
Back in the 18th century, Denis Diderot foresaw this state of affairs: "The number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in an immense multitude of bound volumes."
ReplyDelete
Replies
Amir NoteaSeptember 11, 2009 at 9:42 AM
A point for statistics literacy: Correlation doesn't imply causality
http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation
ReplyDelete
Replies
Amir NoteaSeptember 11, 2009 at 9:44 AM
A point for statistics literacy: Law of Large Numbers
http://en.wikipedia.org/wiki/Law_of_large_numbers
ReplyDelete
Replies

Add comment

About K12 Education

21st Century Education System

Monday, July 27, 2009

Data Crunching

3 comments:

Popular Posts