A captchas tale
Did you ever think about implementing captchas for your website? Did you considered using some of the big providers, like google and co?
Before i start continiuing on this, let me clarify that I'm not going to explain how to implement your own captcha, but to think a bit more about captchas in general and how efficient they're in their main task: preventing automatism.
Let's start with a really strange Idea, this one is kind of funny, but something that might be really in production. As you might noticed, the headline is intended to sound like a known Service, that is offering a Service to actually circumvent captchas. They use things like advanced OCR and probably some people hacking in captchas all the day. However, what would be if I say, that just everyone and anyone can start his own captcha solving service on the interwebz, without having any knownledge about OCR and all that other stuff at all?
Let's imagine we would build a service like this.
Your first own Captcha Solving Service
Ok, so what do we need to solve the simplest of captchas available? To note I will leave out some other kinds of captchas, like this fancy puzzles you probably know about. Just the simple I have some cats for you, or I have those words or letters with noise for you.
- An API to submit a captcha
- An API to submit the solved captcha was successful
- Analytics and some simple techniques to analyze if someone is abusing the successful catpcha API endpoint
- Ability to take a screenshot from the displayed captcha
- The ability to fill out and send the solved captcha
- Maybe also simulating mouse clicks
Now that we have that short list of things we need, lets go a bit more into detail, I will not loose to much words about the frontend, as this is really just straight forward, but here comes the backend.
Fist of all, when we talk about the backend, the customer clearly does not want to pay for a captcha that has not been solved, thus we provide him an API that is accepting success and failure messages for the solved captcha. Now we have a more satisfied customer, but hey isn't that just to easy to manipulate? In fact, yes. But it is also as easy to fix, but I will come back later to this.
Next we would have a customer submitted captcha in the backend, but how do we solve it?
We have some possibilities, let's count the traditional ones:
- Solve it on your own
- Employ some people to solve them
- Use advanced OCR
This is pretty much of what nearly all of the solving services do, but imagine to do the following:
Use your customers entered captchas to feed the captchas that are going to be solved by Users of some of your other Services, that require captcha solving.
Yes you heard right, I'm talking about solving captchas by require your users to solve captchas, which has been asked to be solved by your customers.
- No employees
- You do really nothing yourself
- You don't need any OCR
Everything you need is a huge Service with a huge userbase, that requires their users to pass some captchas, for example to register for some kind of Event. Or in short, enough traffic to handle all incoming requests.
Ok, but how do we know if that captcha was really solved? Let's go back to the point where we talked about the API Endpoint to mark a captcha as solved or not, it really do not get any complicate in any way.
The customer says yes, it was solved, thus the user receives, yes it was solved. To avoid abuse by the captcha solvers, you simply let many of your users solve the same captcha to prevent that the user is entering something wrong, we check against all solvings and predetect the solution by using the one that was entered most often.
If the customer is still marking the captcha as not solved, all the users get told the captcha was wrong, even if they were right. This not solved captcha now goes into an analysis DB, which you can lookup later, to identify abuses through the customer, while this DB would help afterwards, you could reduce the factor, by just creating the following rule: If the user marks 5 captchas in a row as unsolved, he still is going to pay for 1 of it, if the user continues to mark 20% of all captchas as unsolved, he is going to be blocked, until you have checked the mentioned DB. This way you would have at least kind of insurance, but as this is just an example, we keep it like this without thinking through it any further.
Thats basically really it, you're done. To be fair this is really an unusual way to think of and probably everyone who has enough users to do this, has obviously other priorities. But the idea was funny enought to me to talk and think about it and it also gives an opinion about how (easy) hard it is to circumvent a captcha.
If you would ask me, if I use any known captcha Services, I would answer yes. While they might not be the ultimate weapon against Spammers, Bots and other forms of unwanted automatisms, they're as of today quite efficient in preventing or at least slowing them down.
But if you ask me, if I think that captchas have any future, I would clearly state: No, they do not. I might be wrong, but not only this strange kind of idea above would generate problems with captchas. But also OCR is getting better and better and as soon as we reach the point that OCRs are better in solving captchas than we humans are, something has went terribly wrong.
Proof of Work
I think a more efficient way is to utilize Proof of Work algorithms. We simply combine this with some kind of rate limiting. We're not really limiting the request the user makes, but we let the user do more work if he is asking for an unusual amount of PoW Requests. This will not only slow down any automatism drastically, it also increases the costs. The cost would still need to be low enough, otherwise big organizations would get problems, where many people share the same IP. Some may argue, that IPv6 would help here. Well certainly it wont, it just makes us watching a /64 subnet instead of a single IP. Otherwise it would be to easy to circumvent the raise of difficulty, as just everyone gets a /64 block.
There are of course many more edge cases if one wants to use this strategy, like what do we do with computers from the stone age which are not able to solve the fastest of all PoWs, but it would be one way to solve this problem.
While the PoW described above would still remain the User anonymus, another strategy would be to exchange data with the client and verify through signed signatures that we're really communicating with user x. This would be practically very efficient, as users could just be banned in case of abuses, but it wouldn't be ever an idea I would suggest to use. It makes the user completely tracable and is the foundation for abuse through government and corrupt spy agencies, like the NSA.
Are they any good?
Yes totally, lets talk about...
what captchas are really useful for
Captchas do something!
They force the user to take some action to process a wished request. While many think that a checkbox is enough, a captcha really verifies that the user has noticed that something is on this site, which is forcing him to solve this captcha. He actually needs to think about the whole thing and may be, but I doubt it, he wont make a decision he does not want to.
One might also say, wouldn't it fit more to let the user type in the sentence: "Yes I really want to give you all my money!", may be, but in the end the user probably does not think more about this than checking a checkbox.