What! Machines are outperforming humans on reading comprehension

AI online edtech

Why bother reading when you can get a robot to do it for you?

Holy cow! I'm not sure if I'm overselling this (yet) but the world of education may have just changed forever...

On March 30 of this year (2019), a Chinese AI company, iFLYTEK, has made an algorithm that outperformed humans on reading comprehension. Let me write that again, a machine beat a top performing human-being on a reading comprehension challenge.

SQuAD (Stanford Question Answering Dataset) is a AI machine reading comprehension test to see if machines can read better, not just faster, than humans. The questions are based on a set of Wikipedia articles, "where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable."

"SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones..."To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering...We are optimistic that this new dataset will encourage the development of reading comprehension systems that know what they don't know." (In other words, the officials are trying to ensure the algorithms are actually smart and can't simply guess).

The way teams enter is by creating a machine-learning algorithm and "training" that algorithm based on the test data set (the Wikipedia pages). "Training" an AI usually telling if it got the answer right or wrong either with a human or another algorithm. Once the team is ready, they submit this to the officials, the officials run the test and provide the result.

Think that this is an easy test? Check out some of their questions here about an "easy" topic like the Intergovernmental Panel on Climate Change:

What does the UN want to stabilize? Who was President of the IPCC before Hoesung Lee? (Watch out machines for the trick question) What substantial error put the IPCC research in doubt?

In order to make sure challengers do not cheat, the challenger lends the examiners their "model" (the program), the examiners run the program and provide official scores. The test set is not made public to avoid cheating.

AI online edtech

Why is this a big deal?

I'm not sure about you, but I was comfortable when machines could fill a form for me but not when they can do my English homework for me.

The implications of this are actually jaw-dropping. Exactly 9 months ago, the top grade for a human completing the test was 86.8% vs 63.3% scored by the best AI completing the test.

It took just 7 1/2 months for teams to take that 63.3% score to an 87.14% and outperform the best human candidate. That is a 37.7% improvement!

But wait, I hear you say, it took 3 days and the worlds best engineers, programs, algorithms to finish this test. So what?

Well if you're happy enough with getting a B+ (75%) and can wait 56 minutes, then it will now only cost you $0.57. Yes, less than $1 to get a 75% pass. Oh, and in three months that machine-training time was reduced 300%.

What does this mean for education?

If you were one of the hundreds of millions of students around the world trying to get into University, and you could either work every day to get an A in your entrance exam or could pay $0.50 and have a machine do it for you, what would you choose?

Now there are plenty of closed-book exams that might prevent this kind of blatant cheating but it questions the world of academic assessment. How are examiners going to keep ahead of this kind of technology?

Before electronic calculators, there were "human calculators"(Remember that amazing movie Hidden Figures?) Did universities ban calculators for completing maths exams? No. So why should human students try to pass tests that a machine could complete better and in mere minutes?

What does this mean you?

It's never been easier to teach online!