Discouraging data: women in CS and IT

In making my mark in the realm of data and information visualization, it will probably do me good to become a better and more knowledgeable coder. I am now looking into pursuing a little more CS education, and am excited about diving into edX MOOCs in computer science (remember when edX was OCW?).

I’ve never shied away from things technical. I enjoy every opportunity I get to learn new software and programming languages, and nothing sucks me into an-all absorbing work cave as effectively as a new Javascript, HTML or CSS coding challenge. I’m even considering diving much deeper into CS than just the basics. After all, the entry level pay for a computer scientist or software engineer is at least 1/3 higher than the entry level pay for people in my current line of work.

However, these data give me pause:

Looking at the BLS numbers, it is interesting that these professions attract more women (as a percentage) than software engineers (20.2%):

  • Bailiffs, correctional officers, jailers (26.9%)
  • Chief executives (25.0%)
  • Database administrators (35.3%)
  • Biological scientists (45.1%)
  • Chemists and materials scientists (30.0%)
  • Technical writers (50.4%)

Even the professions that are said to have a glass ceiling (such as CEO) have more women in them than software development. Based on the number of science positions listed in the BLS data with substantial numbers of women in them, it is clear that the myth that women are afraid of math or science is just plain wrong (even if less than 1% of mathematicians are women). And given the bizarre outlier of DBAs at 35.3%, and technical writers at 50.4%, we can see that women certainly do not dislike computing fields in general.

IT gender gap: Where are the female programmers? by Justin James

Now I remember why I wasn’t attracted to CS at university. I would try to strike up conversations with computer geeks, and then get shut out of the weirdly intense technobabble tournament that every computer geek conversation eventually turned into. My work is now and was then a huge part of my life; but I learned very early that the people I surround myself with are at least as important as the work that I do. At the time, a choice of major seemed like a choice to surround myself with people like the people in that major for the better part of my adult life.

I can’t be the only woman who looked at the majority culture of computer programmers and thought, is this it?

MOOCs are not the end of education; they are a game changer

Some teachers are brilliant and talented coaches and mentors. But there’s still way too much focus on information delivery in higher education that just doesn’t make sense in an era when so much information is already freely available. The Powerpoint lecture generation is long overdue to move on and acquire different skills, and MOOCs are an important disruptive innovation that I hope will hammer that point home.

Many traditional professors could easily be replaced by a recording, a la Real Genius. As the author points out, MOOCs are good for information delivery, and many professors do little or no more than deliver information – the “real” learning happens in the problem sets and the projects and essay drafts, in the performance, feedback, revision cycle, which many universities and professors relegate to lowly paid graduate or even senior undergraduate students and to which they devote token attention.

MOOCs, then, are cost-effective substitutes for lecturers. Why pay someone every year to deliver the same lecture to a limited-capacity room when you can simply pay the same person once to record the lecture and distribute it to a zillion more paying students almost anywhere in the world?

With the cost of higher education rising as rapidly as it has as of late, students and parents are demanding more value in the education services provided by universities and college. Many educators are going to have to change the way they teach to demonstrate their relevance and value in the rapidly changing education marketplace, just like folks trained decades ago in other professions are now facing the need to retrain and modernize their skills to stay gainfully employed.

In the Internet era, when information is so cheap and easy to get, many teachers still maintain an antiquated focus on information delivery. Think of information like open source software. The raw materials are free, but you need a lot of training or practice or both to get them to work for you. So teaching with the Internet ought to be more like delivering a value-added service for free and abundant information resources.

I, like many others, would expect education to shift away from information delivery and more toward coaching students in finding, evaluating, using and generating information.

Is the attrition rate for online courses appallingly high? Yes. But that’s not the point. If you take the savings you reap from not having to pay the lecturer to deliver the same lecture every semester and invest it in a rich layer of coaching and mentorship on top of that online course, you might find that that course does more of what we in the real world need and expect it to do: prepare students for a world where information is cheap, and judgment, creative insight, analytical and collaborative skills are the real prizes.

How to lie, with or without statistics

Bank robberies surge in Boulder, Longmont

Both Boulder and Longmont have noted a marked increase in bank robberies so far this year over 2010, a surge FBI officials say they can’t explain.

Boulder has had two bank robberies so far this year, up from none during the same period last year.

Longmont has had four bank robberies — twice as many as the city had in all of last year.

“When you only have two (bank robberies last year), that’s a huge increase,” said Longmont police Cmdr. Jeff Satur. “We’re hoping they’ll die off and slow down.”

The explanation is that it’s not a “marked increase” at all. When your average annual bank robbery numbers are in the single digits, and they’re still in the single digits a quarter of the way into the year, it’s not a big deal.

How about some context? For starters, here’s five years of Boulder crime statistics: 2009 fact sheet: five years of crime statistics at a glance

Let’s look at the “robbery” row (since bank robberies aren’t called out) and apply some basic statistics:

2006 2007 2008 2009 2010

Robbery 29 27 33 51 29

No year is going to be exactly like the previous one. It’s normal to see numbers like this go up and down with natural variability. If you started keeping track in 2006, when there were 29 robberies, the 27 in 2007 wouldn’t surprise you, and the 33 in 2008 would prompt you to expect about 30 or so robberies in 2009. But the 51 in 2009 would throw you. Is this the beginning of a multi-year crime wave? Well, with 29 robberies recorded in 2010, it looks like we’re back to normal.

If you could only see 2008 and 2009 data, you would think, whoah! 51 is a pretty big jump from 33. Crime is on the rise! What could be at fault? But if you can see all the data from 2006-2010, you realize that the 51 in 2009 is just a blip in a local robbery rate that hovers around 30 per year.

And one cluster of bank robberies does not mean bank robberies are on the rise in Boulder County or that they even deserve a specific explanation. It’s called natural variability. Stochasticity, if you will. One hot summer does not prove global warming, nor does one cold winter disprove it.

So the numbers aren’t backing you up if you say “robbery on the rise in Boulder.” But you can make it look that way if you present a table of 3-year moving averages:

2007 2008 2009

Robbery (3-yr moving average) 30 37 38

These numbers are basically true, but they can be used to tell a lie. To get a 3-year moving average, you average each year with its two neighboring years. This is a common statistical practice to smooth out some of the year-to-year fluctuations that you typically see in real data. But it’s nearly meaningless with this tiny amount of data. Five years gives you only three points, which is nowhere near enough information to infer a trend. The last two points are both skewed by that 51, and the moving average effectively erases that little 29-robbery blip in 2010. If you wanted to use real numbers to make the point that the tough economy is driving Boulder folks to desperate measures, you’d do it this way.

We expect robberies to grow with population, too:

2006 2007 2008 2009 2010

Robbery 29 27 33 51 29

Population (est.) 102,659 102,659 103,100 102,800 103,600

The population changed by less than 1% per year, so you wouldn’t expect a big effect from population growth. But wait, this table says the population didn’t change at all between 2006 and 2007. And the population after 2007 is always a round number? Ah yes, the red highlighted abbreviation means “estimate.”

Numbers like the population counts in this table should set off a quiet alarm in your head. It’s very, very unlikely for an honest census to turn up round multiples of 100 three years in a row. A one in a million chance, really.

Anyway, what does this says about bank robberies? Is it a sign of our financially strident times that Longmont had four bank robberies all piled up in the first three months of the year, when it only had two bank robberies over the twelve months of 2010? Here’s our table of all the data we have about bank robberies:

2010 2011

Bank robberies 2 4 (first 3 months)

This is a pretty feeble-looking table. Any statistical inferences you can make from this table would be feeble, too. The second number doesn’t even represent a whole year of bank robberies, so we can’t average it with the first one, which does. There could be zero bank robberies for the rest of 2011, to give us a total of 4 bank robberies for the year. And you could make the alarming statement that “bank robberies are up 200%” or “100%,” however you like to mislead your public.

All we can really say from this table is: our numbers are far too small, and we have far too few of them, to make any statements about bank robbery in Boulder County other than that it seems to be quite rare.

Apples to apples in interracial marriage

This is a follow-up on a NYTimes story, Black Women See Fewer Black Men at the Altar. The range and extremes in this cursory analysis of interracial marriage rates are pretty striking:

Of all 3.8 million adults who married in 2008, 31 percent of Asians, 26 percent of Hispanic people, 16 percent of blacks and 9 percent of whites married a person whose race or ethnicity was different from their own. Those were all record highs.

Well, since Asians are the smallest ethnic group of the four, just by sheer odds we should be marrying outside our race (‘marrying out’) more often than the other groups. But how many of the interracial marriages are due to preference and how many of them are what we would expect just by the numbers?

If your choice of whom to marry were completely independent of race, the chance that you’ll marry someone from another race would be about like the chance you’ll run into someone from that race on the street. If we take the U.S. Census Bureau data from 2004 (most contemporary survey with all four of the largest races), the single (not married or separated) population over age 15 is:

3.6% Asian
13.3% Hispanic
16.5% black, and
66.5% white.
So since 96 of every 100 single people (13.3+16.5+66.5 = about 96) in the States are not Asian, you’d expect about 96 of every 100 Asians to marry out.

But only 31 of each 100 did, so it looks like Asians show some tendency to marry each other (‘marry in’) more than they marry out. Let’s look at some ratios to see how the same-race preferences compare across the races:

intermarriage rates
race expected / actual = ratio
96.4 / 31 = 3.1
Hispanic 86.7 / 26 = 3.33
black 83.5 / 16 = 5.22
white 34.5 / 9 = 3.83

The higher the ratio, the more likely that race is to ‘marry in’. Asians are, again, the biggest miscegenators, but not by a lot. Blacks, on the other hand, are far more likely than the other racial groups to marry each other based on what we would expect from race-independent marriage. We can speculate on the contributing factors – prejudice, prison, education, age structure, other socioeconomics – but I don’t have any data to support or refute any of them for now.

By the way, in 2004, the percentage of each racial group married without separation:

Asians 61%
Hispanics 50%
blacks 34%
whites 57%.

Asians are almost twice as likely to be married as blacks.

[In a New York Jewish accent] Talk amongst yourselves.

In addition to the referenced NYT story, I pulled March 2004 Census Bureau Community Survey data from these sources:


From those tables I added the numbers of widowed, divorced, and never married to get the following numbers:

3,623,000 single asians >15 both sexes in March 2004
13,294,000 single hispanics >15 both sexes in March 2004
16,499,000 single blacks >15 both sexes in March 2004
66,408,000 single whites > 15 both sexes in March 2004
99,824,000 single people total in March 2004

The percentages of each racial group that are married are taken directly from the linked tables. Yes, I am leaving out Inuit/American Indian, Hawaiians and other Pacific Islanders, as well as mixed-race. They account for, respectively, 0.8%, 0.14%, and 2.3% of the population, too small a percentage for the CB to have useful data on them.