The end is nigh: generic solving of text-based captchas

Elie Bursztein; Jonathan Aigrain; Angelika Mosciki; John Mitchell

Over the last decade, it has become well-established that a captchas ability to withstand automated solving lies in the difficulty of segmenting the image into individual characters. The standard approach to solving captchas automatically has been a sequential process wherein a segmentation algorithm splits the image into segments that contain individual characters, followed by a character recognition step that uses machine learning. While this approach has been effective against particular captcha schemes, its generality is limited by the segmentation step, which is hand-crafted to defeat the distortion at hand. No general algorithm is known for the character collapsing anti-segmentation technique used by most prominent real world captcha schemes. This paper introduces a novel approach to solving captchas in a single step that uses machine learning to attack the segmentation and the recognition problems simultaneously. Performing both operations jointly allows our algorithm to exploit information and context that is not available when they are done sequentially. At the same time, it removes the need for any hand-crafted component, making our approach generalize to new captcha schemes where the previous approach can not. We were able to solve all the real world captcha schemes we evaluated ac- curately enough to consider the scheme insecure in practice, including Yahoo (5.33%) and ReCaptcha (33.34%), without any adjustments to the algorithm or its parameters. Our success against the Baidu (38.68%) and CNN (51.09%) schemes that use occluding lines as well as character collapsing leads us to believe that our approach is able to defeat occluding lines in an equally general manner. The effectiveness and universality of our results suggests that combining segmentation and recognition is the next evolution of captcha solving, and that it supersedes the sequential approach used in earlier works. More generally, our approach raises questions about how to develop sufficiently secure captchas in the future.

Available Media	Publication (Pdf) Slides (pdf)
Conference	Workshop on Offensive Technology
Authors	Elie Bursztein , Jonathan Aigrain , Angelika Mosciki , John Mitchell
Citation	Bibtex Citation @inproceedings{ BURSZTEIN2014THE,title = {The end is nigh: generic solving of text-based captchas},author = {"Elie, Bursztein" and "Jonathan, Aigrain" and "Angelika, Mosciki" and "John, Mitchell"},booktitle = {Workshop on Offensive Technology},year = {2014},organization = {Usenix}}

Recent

On the consequences of the AI workforce entering the market

RETVec: Resilient and Efficient Text Vectorizer

How AI helps keeping Gmail inboxes malware free

Get cutting edge research directly in your inbox.