Since the early 2000’s websites have been using Captcha to filter spam.
Captcha stands for:
Completely
Automated
Public
Turing test to tell
Computers and
Humans
Apart
Why do we need it?
In 2000 Yahoo had the problem of spammers writing programs to obtain millions of email accounts and they couldn’t figure out how to stop them. So Luis Von Ahn invented a test that would differentiate humans from computers, that any human regardless of age, gender, education and language would be able to pass. The test needed to be graded by a computer but the computer shouldn’t be able to pass the test itself.
How does it work?
Humans are really good at Optical Character Recognition or reading and we’ve been doing it since we were young. Humans have become used to reading text in many different conditions – at different angles, in different lighting and distortions such as bad handwriting – which made it the perfect test because computers of the era weren’t very good at this.
What is it?
So the idea was to give computers the correct text so it knew the answer, then stretch and warp the text so the computer would be able to grade it but a bot without the answer wouldn’t be able to understand it. For the human the Captcha text is all represented as keys on a keyboard so its pattern matching at the basic level without the need to spell. The test worked for Yahoo to help stop bots but all the letters and numbers that humans typed were doing something else in the background – making computers smarter.
Teaching computers to read
ReCaptcha came out in 2005, a test that used 2 words. One was generated so the computer knew the answer and the other one was taken from a book or old distorted piece of text that the computer had no idea about. The programme assumes that when the human got the first generated word correct that the second word was also likely to be correct but the same word would be distributed to other humans to be certain, if there was a consensus between the other humans the programme would approve the word. So many of these tests were taken that every 4 days, a year’s worth of NY Times articles were digitised.
In 2009 Google obtained reCaptcha to digitise their news archive and scanned books and began to build a comprehensive image library of distorted characters. The computer then used all of this data to be able to extrapolate letters and words from new images. Humans ended up teaching computers how to read extremely warped text via Captchas.
Computers outsmarting humans
In a 2014 Google test, humans could read the most distorted Captchas 33% of the time and their AI got it correct with 99.8% accuracy. Once the computers got better than humans the test had to change.
ReCaptcha V2 used images for the same purpose, differentiating between humans and computers to stop spammers. But this time it was objects from the real world that Google got humans to teach to machines. Most V2 tests used transport photos (cars, traffic lights, zebra crossings). The data was used by Google to improve Google maps and also train their self-driving cars to see these objects. Computers were getting better than us at solving these picture puzzles in the same way they learned to read warped text better than humans. So the test needed to change again including the way the test was being graded by computers.
Behaving like a human
NoCaptcha and reCaptacha V3 looks at behaviour to differentiate between humans and computers. This Captcha is nearly invisible and works by running a secret test constantly in the background. If it identifies any bot-like behaviours such as typing paragraphs of text in seconds or clicking around too quickly it will require a standard picture test or two factor authentication. Which is a lot better from a usability point of view than deciphering warped text or solving puzzles.
What’s next?
There’s no public facing answer for what our clicks are teaching computers to do and there’s no knowing how long behaviour tracking Captchas will last before computers can out smart them again. If you find that you are receiving a lot of spam via your website contact forms then get in touch and we’ll put in place filters that will reduce spam but won’t hinder your customer’s journey by answering frustrating questions.