Blog Cyberjustice - CAPTCHA vs Internet Giants: a final match, arbitrated by Artificial Intelligence

The Apple Worldwide Developers Conference (10 June 2024) has unveiled a promising video showcasing new features powered by its Artificial Intelligence ecosystem in the upcoming iOS 18, iPadOS 18 or macOS Sequoia updates. The tech giant emphasizes that daily tasks will be significantly enhanced by the algorithms behind Machine Learning (ML) and Large Language Models (LLMs).

Thanks to a new interface offering a truly personalized experience, powered by Advanced RISC Machines (ARM) Silicon microchips, a close collaboration with Open AI’s ChatGPT, and enhanced privacy measures, querying external servers will only be necessary for tasks requiring computational power beyond the user’s device capabilities.

However, any of us must remember monumental scenes from the cinematic adaptation of Alan Turing’s biography, The Imitation Game. Bletchley Park, the main decryption site in the United Kingdom during World War II and now home to the ‘National Museum of Computing’, reminds us of where Turing deciphered the German Enigma machine codes. As founder of digital information processing, cryptologist, mathematician and pioneer of Artificial Intelligence, Turing laid the groundwork for what we often encounter today as CAPTCHA tests on Google, Baidu, or Microsoft websites.

Although these text or image-based security mechanisms are widely used by digital platforms, advances in ML are making them increasingly vulnerable. Originally, CAPTCHAs were valuable assets for website owners, as they allowed for distinction between real and fake traffic, and effectively reduced spam on these websites.

CAPTCHA’s origins

In 1950, Turing presented a research on Artificial Intelligence titled ‘The Imitation Game’, which laid the foundation for what is now known as the Turing Test. The test was inspired by a simple game involving multiple participants who asked questions to a man and a woman placed in different rooms, the man’s goal being to imitate the woman’s behavior, so that participants could determine which one was actually the woman. Today, the applications of this concept have greatly expanded.

In his attempt to determine if a ‘machine could reason’ and exhibit a form of ‘human intelligence’, Turing proposed that the computer needed to successfully imitate both a man and a woman. Today, the test involves a conversation between three parties: two humans and a computer. If the judge, whose task is to identify the computer, fails to do so, the program behind the computer is considered to have passed the test. CAPTCHA tests represent a reversed variation of the Turing test, where humans must demonstrate to a computer that they are not robots.

Apple’s Automatic Verification, a fighting opponent of CAPTCHA

Apple defines CAPTCHAs as a ‘random sequence of letters, numbers, or images that asks you to identify distorted characters or select specific images to verify that you are a human and not a robot (or bot) when you log in’.

In this context, using automatic verification when connected to your device with an Apple ID bypasses the traditional procedures of a CAPTCHA, ensuring secure authentication for participating apps.

An Apple server then validates the concerned devices and Apple IDs, generating a third-party token. The latter guarantees a private access token that allows a user to access an app or a website. However, Apple is never aware of specific logging activities, browsing history, or any device – or Apple ID-related information. The token issuer server only knows whether the procedure was successful.

Using Generative Adversarial Networks (GANs) as CAPTCHA Solvers

GANs, initially developed in June 2014 by Ian Goodfellow and his collaborators at the University of Montreal, were originally conceived as a type of generative model for unsupervised learning. They belong to a subclass of ML Frameworks aimed at advancing generative AI by creating new data that resembles existing data.

But how do these GANs actually work?

First, they operate using two antagonistic neural networks: a generator and a discriminator. The first one, as its name implies, creates new data based on samples from the input data, while the second one’s role is to determine if the generated data can be distinguished from the original data set. In simpler words, it acts like an authentication mechanism.

This process is iterative. The generator continually improves its generation of fake data to the point where the discriminator can no longer differentiate between the two data sets.

Among the most common applications of GANs, we can observe: artistic creation and multimodal generation (including images, videos, and sounds); restoration and conversion of images to text, and vice versa; creation of Deep Fakes, which are ‘realistic manipulated images or videos that imitate real people or situations’.

In a 2018 study titled ‘Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach’, researchers found that GAN-based approaches represent a significant improvement over traditional methods for solving text CAPTCHAs.

Dr. Zheng Wang, one of the study’s authors and a lecturer at the School of Computing and Communications at the University of Lancaster, considers this development ‘alarming because it means that this primary defense of security for many websites is no longer reliable’.

In response, website owners should focus on alternative methods for bot-detection, such as implementing ‘multiple layers of security including behavioral pattern analysis of users, device location, or biometric data’.

AI Algorithms continue to excel in tasks often seen as futile or time-consuming. However, it is vital to grasp their importance: website administrators, operational security teams, and the users must collaborate to enhance app and website security and efficiency. This includes addressing evolving threats from malicious uses of AI.

Roxana Vener

Master 2 Cyberjustice – Promotion 2023/2024

Sources

Alan Turing

Alan Turing Test

Apple – Automatic Verification

Apple WWDC (10 June 2024)

ARM Architecture

Bletchley Park

GANs

Generative adversarial networks

Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang. 2018. Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18). Association for Computing Machinery, New York, NY, USA, 332–348