Unpicking the rules shaping generative AI

Laws like Europe's GDPR are already being enforced on OpenAI's ChatGPT. But plenty more AI governance is coming down the pipe...

In the past couple of weeks, already an ice age ago in AI hype terms, a number of very well known names in tech (Elon Musk! Woz!) signed an open letter calling for a halt on the development of AI models more powerful than OpenAI-s GPT-4 — arguing humanity needs time for planning and management around the application of powerful automation technologies which they implied could be on the cusp of toppling man from his pinnacle atop the proverbial food chain.

Whether their call for human civilization to buy time to adapt to “ever more powerful digital minds”, as they put it — and beat the digital genii back into the bottle by somehow collectively agreeing “shared safety protocols” and “robust AI governance systems” — was more a self-interested bid by a subsection of technologists to throw a spanner in the engine of more advanced competitors (so they can try to catch up) is one overarching question. But the letter’s implication that no laws apply to AI is just obviously wrong. Although we can certainly argue how existing laws apply (or should be being applied).

Below, we’ve compiled an overview of some of the main areas where laws are already being flexed and tested in response to generative AI’s fast-scaling automated outputs — and for sure it’s a patchwork, not the kind of aligned (global?) governance the letter claims to be agitating for. We also discuss incoming rules that are set to put bespoke guardrails around applications of AI in the coming months and years.

When it comes to dedicated legislation for AI, U.S. lawmakers are still debating what to do. It’s China and Europe that are fastest to the punch here. But while the Chinese state is moving to control and censor how AI can be used, EU lawmakers see risk-based regulation as key to fostering consumer trust and uptake of the tech — via a claim of safeguards for fundamental rights. On the flip side, countries like the U.K. and India are running in the opposite direction, with no plan to apply fixed rules to AI at this stage — apparently planning to let the tech rip in the hopes it sparks a homegrown high tech boom they can ride to economic glory. So quite the spread of high-level bets are being made. And the prospect of a unified/universal approach looks fanciful, to put it politely.

Data protection and privacy: EU’s GDPR and beyond

Last month, Italy’s data protection authority grabbed the world’s attention by ordering OpenAI to stop processing locals’ data for its ChatGPT generative AI chatbot. It said it was concerned the company is breaching the European Union’s General Data Protection Regulation (GDPR) — in areas like the lawfulness and transparency of the processing, provision of data access controls and protections for minors.

OpenAI responded almost immediately by geoblocking its ChatGPT service to Italian IP addresses. Although the action does not address the regulator’s key concern that its AI model was trained on the personal information of Italians without a proper legal basis. To stop processing that data OpenAI would have to retrain its model with the offending personal data of Italians stripped out (meaning it would likely need to switch off its whole service while it did that). None of which it has done. Instead, it said believes it complies with all privacy laws — setting itself up for a fight with the European Union’s data protection regulators which are more numerous than the EU has Member States.

The Italian DPA followed up this week with a fresh order telling OpenAI what it needs to do to get the suspension lifted by the end of this month. But the to-do list is a long one: Including opening up about its processing; applying age gating and later verification tech; rethinking the legal basis it claims for processing; and providing ways for Europeans to ask for corrections or have their data deleted if ChatGPT generates false information about them, as the bot is prone to.

How feasible some of these asks are given the nature of machine learning technology is up for debate. And OpenAI has just a couple of weeks to make most of the changes. (Reminder: Penalties under the GDPR can scale up to 4% of global annual turnover.) So the GDPR does indeed look like a legal nightmare for chatbot/large language model (LLM) makers like OpenAI which have relied upon hoovering up vast amounts of public (but sometimes personal) data to train their models without, apparently, a thought for EU data protection laws.

There are a raft of instructive GDPR enforcements already when it comes to scraping personal data off the public Internet to train AI models: Controversial facial recognition firm, Clearview AI, has been issued with multi-million dollar fines and orders to delete data right across the EU — from Italy and France to Greece and the U.K. — which, even if local DPAs have a hard time collecting fines from an uncooperative US-based company, at least limits its ability to operate a service in the region since regulators can (and have) chase local users to shut down that business.

Basically, if you’re touching personal data in the EU you need a valid legal basis for that processing. The GDPR allows for a number of possible grounds but most aren’t going to be applicable to commercial generative AI — per the Italian DPA just two are theoretically possible in the case of ChatGPT (consent or legitimate interests).

But asking users to consent to use of their data for training AI models etc seems unlikely to scale, given the vast amounts of data being sucked in. (And few, if any, AI model makers have been asking permission to grab people’s info as training data so far.)

The makers of generative AI services, perhaps especially large language models (LLMs), thus face a particular challenge to ensure they’re not in breach of GDPR lawfulness of processing requirements. At the very least they need to be upfront about how they’re processing people’s data. But explaining how AI works is itself a challenge. So whether they can even do that to the required level of detail is an open question.

The GDPR requires not only a legal basis for any personal data ingested but that the data processor informs people what they’re doing with it — to avoid breaching the regulation’s transparency, fairness and accountability principles. (And let’s just say OpenAI’s recent public missive, re its “approach to AI safety“, does not provide a comprehensive view on how its using people’s data; a section in it, entitled “respecting privacy”, sums to a claim of ‘trust us with your info’ — which certainly won’t pass muster with EU regulators.)

If generative AI model makers want to avoid many more stop-processing orders flowing in the EU (other DPAs are reportedly eyeing action after the Italians stepped in) — not to mention the risk of millions of dollars in fines — they’re going to need to be a lot more open than OpenAI has been so far.

Speaking to TechCrunch ahead of the Italian DPA’s intervention, Dr. Gabriela Zanfir-Fortuna, VP of global privacy for the Future of Privacy forum, essentially predicted what was about to unfold — suggesting DPAs would play an active role in ordering AI developers to open up about how they’re using personal data.

EU watchdogs could act on their duty to enforce the GDPR by issuing orders to providers of LLMs like ChatGPT to “be transparent about the exact data that they use about a certain uniquely identifiable person on the internet”, she said, adding “let’s see” — just a few days ahead of the prediction coming true.

You have to provide transparency into what data you’re collecting. And if those obligations are not met, you are not complying with one of the fundamental principles in Article 5(1A) — lawfulness, fairness and transparency,” she also told us. “If anything that you do touches personal data, and especially if it’s automated, you have to comply with [the GDPR]. And if it’s a lot of sensitive data — or you have a new method to process data — you need those data protection impact assessments that have specific requirements if you want to do the right. You have the data protection by design obligations. But it’s also a matter of needing to be serious about complying with it. But it doesn’t have thresholds in application — it’s just immediately triggered, whenever personal data is touch, is made available.”

The GDPR also gives individuals a suite of data access rights, meaning people in the EU can ask for a copy of info about them that’s been processed; ask for corrections of erroneous data; and even request their data is deleted. But at the time of writing there’s no established way to make such asks set up by OpenAI — such as a formal reporting system to deleting data from ChatGPT.

The lack of a formal tool should not, in itself, be a block to users asking to exercise their rights (they could, for example, send a SAR by email; provided they can find an email address for OpenAI). But the company does not appear to be prepared to deal with an influx of personal data requests. Plus, evidently, the Italian DPA takes the view that it should be facilitating these asks — by providing systems for people to exercise their data access rights.

In its blog post, OpenAI claims it responds to “requests from individuals to delete their personal information from our systems”. However there is no confirmation it has responded to what are known as Subject Access Requests (SARs) as yet — and it is certainly receiving such requests in the EU. So this issue won’t be going away.

As the Italian Garante’s intervention suggests, the lack of formal reporting systems attached to generative AI technologies like ChatGPT is likely to need to change — at least for any AI services that operate in the EU — but one question is how can the developers of AI models respond to these kind of individual requests? And whether they will really commit to removing personal data from their training pool — so, essentially, commit to retraining their models on demand, every time an EU user asks for their info to be deleted or rectified. (Or will they instead seek to avoid being subject to that level of individual control by bringing in a bunch of lawyers to argue against such an interpretation and application of the law, as Big Tech historically has vis-à-vis GDPR enforcement?)

The EU’s privacy regulation takes a broad view on what personal data is. But machine learning systems certainly complicate the picture for regulators by using information as a training resource to build models that can then generate new types of data — drawing on inputs in ways AI makers can’t always explain. Additionally, generative AI models have, at times, been shown reconstructing training data in their outputs — raising the spectre of personal data literally being breached in the process. (And data breaches are certainly regulated under the GDPR.)

At the same time, there is a question mark over how the EU’s comprehensive data protection framework can be applied to technologies like LLMs in practice — and even whether, as one legal expert we talked to recently suggested, the technology might have broken EU law in a more fundamental way by undoing the data controller/processor and data subject model it established for protecting people’s data.

The framework predates the kind of massive data-mining involved in training general purpose AIs such as LLMs (NB: The GDPR has only been in force since 2018 but principles it builds on were set out far earlier, in the Data Protection Directive — a piece of legislation the EU passed in the mid nineties). Given the complexity inherent to AI, there may very well be caution among some EU DPAs about how they approach regulating the tech — to avoid being accused of overstepping the mark by applying the GDPR in ways the lawmakers who drafted the framework did not intend.

One thing to consider on the lawfulness of processing point OpenAI is now having to wrangle with in Italy, is if Google and other search engines can get away with scraping people’s data without up-front permission, under their claim of (usefully) organizing the world’s information, why can’t generative AI model makers? The answer to that will likely hinge on nuanced detail — like how much transparency is being provided into what’s being done with the data. (And, indeed, on Google bowing to the EU’s right to be forgotten ruling by providing systems for private individuals to request irrelevant data be de-indexed — something that looks a lot more complex for LLMs to pull off.)

Elsewhere, where the GDPR is concerned, child protection is emerging as a strong motivating force for the bloc’s privacy regulators to step in.

Following the Italian DPA’s intervention, OpenAI noted that it is “looking into verification options” to address safety concerns attached to minors’ information and access to the tech — so some form of age verification looks set to be one concrete early result of GDPR enforcement in the EU. It remains to be seen whether OpenAI will limit any age gating to just Italy or push it out across the bloc or indeed elsewhere. (Another instructive intervention here is the Garante’s recent stop-data-processing order to AI chatbot maker Replika — also citing risks to children.)

That’s not all where the GDPR is concerned; there’s also the question of whether LLMs may be taking solely automated decisions with significant impacts on individuals — which will really depend on how generative AI is being applied (and probably isn’t an issue for general purpose platforms). But, if so, it would trigger a regional right for individuals having information on the logic of those decisions. And, potentially, the right to ask for a human review.

“All of these safeguards [in the GDPR] that are around automated decision making might also be an avenue where data protection authorities can look and that will require increased transparency,” suggests Zanfir-Fortuna. “That could require an avenue for the individual to challenge the outcome of this automated process.”

With DPAs across the bloc faced with tricky decisions on whether/how to enforce the GDPR atop technologies which are being adopted at lightning speed, the European Data Protection Board (EDPB), a steering body for applying the GDPR consistently, may well play a key role in shaping how regulators approach generative AI.

This week Spain’s data protection regulator, the AEPD, told TechCrunch it has asked for ChatGPT to be included in the next plenary meeting — “so that harmonised actions can be implemented within the framework of the application of the GDPR”. And today Reuters reported that the Board has formed a task force to support regulators on possible enforcements. So moves are afoot to try and harmonize how EU regulators respond.

The Board has been playing an increasingly important role in the enforcement of the GDPR in cross-border cases against tech giants — settling disputes between DPAs on a number of cases against Meta and others. So it may well need to take the final word on any generative AI disputes. Although, in the case of OpenAI, since the company has not established a base in the EU, it’s facing local enforcement of the GDPR — wherever and whenever regulators have concerns — rather than being subject to a (typically far slower) cross-border enforcement process in which complaints are funnelled via a single lead regulator and final enforcement must wait until agreement is reached between all interested regulators. Which can take years vs Italy’s Garante spinning up an intervention within a few months of ChatGPT hype going crazy.

Beyond the EU, Canada’s privacy watchdog also recently stepped in to announce a probe of ChatGPT — acting on a complaint alleging the collection, use and disclosure of personal information without consent. And it’s worth noting that while the EU is the most advanced region in regulating privacy and data protection, it is not alone in having an established system of legal protections for people’s information. Indeed, the GDPR has proven to be an inspiring template for lawmakers around the world (see, for e.g., Brazil). Nor, therefore, is Europe alone in considering how privacy rules should be applied to generative AI right now.

The EU’s next trick: Regulating high risk AIs

If you thought the EU has enough digital regulations already, think again! At the time of writing, EU lawmakers are at an advanced stage of three-way negotiations to hammer out agreement on a risk-based framework for regulating applications of artificial intelligence. This follows a draft proposal by the European Commission back in April 2021.

The basic approach for the EU’s AI Act is to categorize use-cases of artificial intelligence as either (in a very few cases) prohibited; high risk; or low risk — with the ‘high risk’ category (examples include AI being used in fields like employment, law enforcement, justice, education, healthcare etc) being subject to regulatory requirements in areas like data quality and security and avoiding bias, both ex ante (before) and ex post (after) launching into the market. While AI apps that are classed as low risk (the vast majority of use-cases) are merely encouraged to apply best practices codes of conduct — so essentially left to regulate themselves. Very few use-cases are out-and-out prohibited under the plan (Chinese style social credit scoring is one.) Although discussions to agree all the details of the planned law continue. 

The initial Commission proposal also contained some transparency requirements for technologies such as chatbots and deepfakes — requiring that users of such services are clearly informed they’re interacting with a machine. However the draft proposal did not explicitly tackle general purpose AIs (aka GPAIs) or what are sometimes couched as “foundational models”. So there’s been concern about a loophole based on makers of GPAIs technically falling outside the listed high risk categories, given their technology is framed as multi-purpose, which could let them claim the law doesn’t apply to them — pushing all the regulatory risk onto downstream AI deployers (such as businesses plugging into their APIs).

It’s certainly notable OpenAI has moved so quickly to platformize ChatGPT — opening up the technology to business users to plug directly into their apps — so it’s clear that any risks associated with usage of generative AI technologies will quickly be massively distributed and embedded wherever the technology is being used. And that looks set to be a truly huge surface area. Which in turn means AI regulations that don’t tackle GPAIs head on risk being out-of-date before they’ve even hit the books.

Luckily for the EU, peak (?) hype around generative AI developments has coincided with negotiations towards agreeing a final text for the AI Act — giving lawmakers a chance to propose amendments aimed at plugging the GPAI gap and ensuring the likes of ChatGPT will be covered.

Last month, Euractiv reported on proposals put forward by the European Parliament’s co-rapporteurs on the AI file to apply significant obligations to GPAI makers like OpenAI — akin to those the draft applied to high risk systems; including risk-management obligations around system design and testing; and data governance requirements for training data-sets. Additionally, it reported the suggested amendment would require technologies like ChatGPT to undergo external audits — to “test performance, predictability, interpretability, corrigibility, safety and cybersecurity in line with the AI Act’s strictest requirements”.

It’s still not clear how exactly the EU will seek to regulate general purpose AIs because we won’t know that until the Parliament and Council reach final agreement in closed-door trilogue discussions. (And AI firms like Microsoft, a major investors in OpenAI, have — in the meanwhile — been lobbying for a GPAI carve out.) But it seems highly likely the framework will be amended to at least take account of ChatGPT and its ilk, given how blisteringly fast these technologies are making a mark.

“I think it’s going to have to be… a bit of a pick and mix,” suggests internet law expert Lilian Edwards, a professor at Newcastle University, discussing where the AI Act amendments might end up on GPAIs. “They can’t just say all general purpose AI top level is high risk. It just won’t play, which is what European parliament have said. And they can’t just try and throw it on to the downstream developers, deployers, which is pretty much what the council has said, because that is obviously unfair. It obviously let’s OpenAI off the hook, etc.”

“It’s quite likely… that we’re going to have some interim regime,” she goes on, sketching a possible compromise whereby responsibilities for different risk-related issues are shared or jointly applied — such as by requiring GPAI model maker to tackle data quality (i.e. fix issues like bias in the training data), since the deployer has no access to the training data, with other responsibilities perhaps pushed on to the deployer since they’re attached to a specific use of the technology which the maker of the foundational model is not directly involved with.

“It might be that we distribute the risks and the responsibilities along the AI value chain… [but] keep the data quality requirements at top level for for the providers,” she suggests. “There’s various axes of argument. There’s not just are they high risk and what responsibilities would that have with it? And when are they high risk? And is it shared with downstream deployers? So that’s kind of like your GDPR situation — where we have this development in adtech having joint controllers who have responsibilities for the personal data at different points in the horrible process of ad personalization.

“So I think we’re maybe heading towards that kind of world. Which everyone will hate. Everyone. Because it’s so unclear. And it will lead to lots of court cases, I would have thought — which again you wouldn’t have thought was what anybody wants. So I guess [joining the dots]… this is obviously going to be controlled by contract. You know, when you buy the upstream GPT-7, whatever it is, to put it into your system one of the terms of the contract will be either that you indemnify or you take on all the risks.”

Edwards notes other recent proposed amendments to the AI Act targeting unfair contractual terms which she suggests are motivated by this line of thought. If correct, future regulation of GPAIs under the EU’s AI Act could boil down to what’s contained in small print terms. (And we all know how users tend to gloss over T&Cs to more quickly get their hands on ‘free’ tech. So it’s not a great prospect.)

Reached last week, the office of co-rapporteur and MEP, Dragos Tudorache, confirmed they are “still working on the compromises that have to do with GPAI”. Asked if the likely outcome will be responsibility for risks being split between GPAI model makers and deployers of their technology, perhaps combined with rules to guard against unfair contract terms being imposed by the former on the latter to try to outsource regulatory risk, the spokesperson told us: “We’re discussing the compromise on GPAI as we speak — we will have a more stable text in a couple of weeks.

“As of now, indeed we are trying to make sure that the value chain is adequately and fairly covered — that is, providers of foundational models do have some responsibilities towards downstream providers and providers in turn do have some responsibilities towards deployers. We do indeed have a section on unfair contractual terms — modelled after the discussions in the Data Act — to prevent the scenario you are mentioning.”

As it happens, Edwards has been working on a research project looking at T&Cs of LLMs and generative AI platforms. The project isn’t finished yet but her early observations are that most terms aren’t tailored to the specific service/uses of the AI tech, with gaps for instance around areas like training data.

So the suggestion is there’s a lot of copypasta lurking down there on generative AI apps and the business end of this ‘AI value chain’ may not sum to the kind of meaningful disclosures lawmakers are hoping for. (So, basically, more compliance theatre. Lawyers rejoice!)

“I would bet my bottom dollar actually that we’ll find that throughout — that these are very boilerplate terms and conditions,” she suggests. “Except for the ones from [major players] like OpenAI, where the you know, they’ll be paying lawyers the big bucks. But for all the small downstream deployers I bet the terms and conditions are really, really boilerplate and don’t refer to any issues you might like to be clarified.”

It’s worth noting that, from a consumer perspective, T&Cs are already regulated in the EU, via the Unfair Commercial Practices Directive. So EU lawmakers may be anticipating those rules will apply to generative AI services which are providing a service to consumers — putting some limits on the conditions they can unilaterally impose on users. (But, again, that’s assuming users even read T&Cs and/or know to call out unfair terms.)

The enforcement question certainly looms large over the EU AI Act, even while the detail is still being debated. And, clearly, paper rules are only as good as action that backs them up so having a legal framework is just a first step. It’ll be how — or whether — it gets enforced that really counts. And that remains a challenge for future years.

That said, the EU does have a draft plan on the table to expand liability protections to AI and software — so it’s  fashioning both fixed rules for riskier applications of AI combined with the threat of lawsuits for model makers that fail to ensure their products don’t scale harms. (And not just physical harms either: The liability framework proposal aims to encompass breaches of fundamental rights, too.) So, again, litigation may have a key role in shaping practical guardrails for AIs.

State censorship: China’s surveillance of ‘deep synthesis’

Generative AI services operating in China are already subject to an expanding set of rules that put restrictions on the use of the tech for activities that could endanger national security, damage public interest or are otherwise illegal — relying on the Chinese Internet’s real-name verification apparatus to enforce surveillance and censorship on AI users.

The Chinese state has sought to impose restrictions on generative algorithms since passing a law in December. Under the regime, AI services are required to verify the identity of users and report violations — enabling a system of state-mandated control on the tech’s inputs and outputs. Service providers must also audit AI-generated content and user prompts — which has led to early generative AI offerings that filter politically sensitive content. 

The state’s AI regulation also bakes in a requirement that platforms seek permission to alter others’ faces and voices via deep synthesis; and watermark AI-generated content if there’s a risk it could be misconstrued by the public, with — also — a general ban on using generative AI to generate disinformation. Add to that, China passed its own comprehensive data protection regulation, back in August 2021, and AI developers and deployers are required to comply with that rulebook if they’re processing personal data.

There’s more coming too: Just this week, the state Internet regulator revealed additional draft rules for AI, slated to be passed later this year, including a requirement for algorithms to be registered with the authorities.

Under the expanded proposal, providers of generative AI services must apply a range of prohibitions on what their tech can produce — with the rules banning content that subverts government power and authority or questions national unity, for example, along with bans on objectionable content such as ethnic discrimination and terrorism. So the censorship looks to be dialling up.

China is clearly wasting no time in establishing a framework to control this powerful new iteration of automation tech, even as internet tech giants compete to roll out their own ChatGPT competitors.

One question that arises is whether the level of control and surveillance being sought by the Chinese Communist Party over generative AI inputs and outputs ends up having a chilling effect on local tech giant’s ability to develop competitive systems. Western companies, including the U.S. AI giants leading the charge, face a far lighter level of regulation that may help them innovate more quickly and win the metaphorical ‘AI arms race’ while Chinese tech giants are grappling with layers of state-imposed red tape.

Elsewhere: Indecision or inaction…

While much is still to be firmed up re: the EU’s incoming AI rules, the bloc remains strides ahead of other Western democracies on grappling with how to regulate artificial intelligence — thanks to its decision to prioritize rule-making for “trustworthy and human” AI several years ago.

But, over in the US, lawmakers are considering what an “accountability mechanism” for “trustworthy AI” might look like — echoing some of the language the EU used when lawmakers were drafting their own risk-based approach to AI. The discussion sounds familiar: With U.S. policymakers talking about the need for “responsible AI” which considers risks in areas like privacy, security, bias and so on. 

“We are still in the early days of the development of these systems… It’s clear they are going to be game-changing across many sectors… But it also becoming clear that there’s cause for concern about the consequences and potential harms from AI system use,” said Alan Davidson, the assistant secretary of commerce for communications and information, earlier this week — announcing a government inquiry on how to guarantee trust and accountability in AI systems. “There are risks — risks to privacy, security and safety, potential bias and discrimination. Risks to trust and democracy and implications for our jobs, economy and the future of work.”

The latest effort builds on earlier steps, such as work done around auditing AI and an “AI Bill of Rights” the White House produced last year. However those recommendations for privacy protections, safety measures and checks against discrimination were still voluntary. So it’s a far cry from drafting a legislative framework. And the Biden administration’s policy moves in this area remain in an early, consultatory stage — lagging far behind tech developments. (Plus, of course, with a U.S. presidential election looming next year there’s little time for the current government to set ‘Made in America’ guardrails for generative AI. Any such effort would presumably have to wait for a second Biden term.)

The U.K. government, meanwhile, recently confirmed it will eschew setting any fixed rules for applying AI for the foreseeable future — outlining a light touch regime in a white paper last month, accompanied by loud rhetoric about wanting to fire up economic growth via AI innovation. Under the plan, it will expect existing (overburdened) regulators to produce sectoral guidance for AI, with no more money (nor powers) to compel model makers to implement recommendations. So, basically, no rules; just ‘take a risk’ self regulation. The U.K. is also in the process of watering down the domestic data protection regime. (But — just to throw a curve ball — the government has also drafted sweeping Internet content regulations, aka the Online Safety Bill, under which platforms will face amped up pressure to tackle illegal content, including deepfake porn shared without consent which is set to be criminalized.)

India has taken a similar tack to the U.K. on stepping away from dedicated AI regs, recently signalling it won’t be rushing to regulate generative AI — as says it sees the sector as a “significant and strategic” area for the nation; (and even a “kinetic enabler of the digital economy and innovation ecosystem” as the IT ministry put it in perfect tech buzzword bingo).

The EU’s risk-based framework for AI may therefore set the de facto regional (and even global) standard — certainly for any U.K. businesses wanting help with how to approach liability and the ability to scale usage of their products across the bloc.

Fines for breaching the AI Act’s high risk provisions were proposed by the Commission at up to 4% of global annual turnover (or €20M, whichever is greater). But enforcement atop complex learning technologies is, clearly, a vast, new intricate challenge. And it will be years before anyone gets to measure how policymakers have performed. 

Still, the EU’s AI Act could help clear up one grey area in relation to generative AIs trained on data processed without a lawful basis under the GDPR: The proposal may contain powers for regulators to order non-compliant models destroyed — whereas it’s less certain the EU’s data protection rulebook empowers DPAs to order an AI model’s complete destruction in such a scenario (vs the more bounded ask that unlawfully processed personal data is deleted, which risks leaving the model itself untouched).

Even more incoming rules for platforms in the EU

As if that wasn’t enough, the European Union has (yet) more digital rules landing shortly — having completed a major ecommerce services and platform update last year which is coming into application this year and next: Aka the Digital Services Act (DSA) and Digital Markets Act (DMA). The pair of regulations apply to digital services, marketplaces and platforms (in the case of the DSA); and to a subset of intermediating/overbearing tech giants which get designated as running “core platform services” (DMA).

Neither of these laws will apply directly to makers of generative AI tools, according to the European Commission — at least not yet. Or that’s what the EU’s executive told us when we asked. But a spokesperson suggested they may apply indirectly, i.e. if regulated platforms and core platform services are making use of generative AI technology.

This is because, in the case of the DSA, algorithmic systems are in scope of the risk assessment and audit requirements for so-called very large online platforms or search engines (aka VLOPs/VLOSE). 

And here again LLMs’ tendency to make stuff up (hallucination/disinformation) could create regulatory risks for larger platforms that opt to embed generative AI services.

“Both the DSA and DMA will apply to respective entities in scope and the latter may use algorithmic systems such as (but not exclusively) Large Language Models (LLMs),” a Commission spokesperson told us, noting that the DSA defines a platform as a hosting service that, at the request of a recipient of the service, stores and disseminates information to the public, unless that activity is a minor and purely ancillary feature of another service“.

“The plug-ins functionality do not change the statu[s] of ChatGPT, if there is no hosting, storing or dissemination of information to the public on the request of the recipient of the service,” they went on. “Where a LLM is integrated into another service, this could potentially bring the LLM into scope of the DSA where it forms part of a regulated service under the DSA. This would cover in particular the risks linked to inauthentic use and automatic exploitation.

“The algorithmic systems used by VLOPs/VLOSEs will be subject to specific rules, irrespective of what technologies they use. Notably, companies eventually designated as VLOPs and VLOSEs are expected to conduct a risk assessment under Art.34 of the DSA which covers algorithmic systems. Such systems would also be subject to audit under Art.37.”

“As part of the ongoing implementation of the DSA and DMA, the Commission services continuously monitor all technological developments and innovative online services. Ensuring that these two pieces of legislation are future-proof has been a key objective of the Commission throughout the process of writing the proposal, and assisting in the negotiations. At this stage, we are not in a position to comment on any specific company or service,” the spokesperson added.

The DMA, an ex ante competition reboot aimed squarely at (pre-generative AI boom) tech giants like Google, defines just ten core platform services types, including online search engines and web browsers, that can be designated as “gatekeepers” (i.e. subject to the regulation) — if the companies providing the services meet certain quantitative thresholds, including turnover, market capitalisation, geographical scope and size of user base.

LLMs are not currently listed as a core platform service so can’t be designated gatekeepers under the DMA. But the Commission made a point of noting that this list “could be extended in the future following a market investigation, should [we] find that new services display features of weak contestability or systemic unfair practices”. So it will, presumably, be keeping a weather eye on developments in generative AI. 

“While LLMs are not as such on the list of core platform services in the DMA, it seems likely that LLMs could be used to support individual core platform services provided by gatekeepers,” the Commission spokesperson also told us — suggesting requirements may again be applied indirectly, i.e. via obligations placed on gatekeepers which find themselves subject to the regulation’s list of ‘dos and don’ts’.

Or, put another way, if a gatekeeper tries to extend the market power of an existing core platform service (e.g. Internet search) through LLM technology (say Google’s Bard or Microsoft’s New Bing) their application of this powerful new form of automation could in theory be subject to DMA rules. Say, for example, the regulation’s ban on gatekeepers self preferencing in search results. Which could mean they face an up-front ban on deploying biased search-with-AI chatbots that always recommendations their own products. (Albeit that would obviously he a dumb and anti-competition trick for either to try pulling off so seems pretty unlikely.)

The DMA’s prescriptive rules were not drafted with LLMs or other generative AI services in mind so the first version of the regulation may offer little in the way of meaningful checks on gatekeeper usage of this form of powerful automation. It would likely need to be adapted as/when specific competition concerns arise — so, again, it would be up to the Commission to propose changes.

As regards DSA obligations on VLOPs/VLOSEs the most immediate consideration may well be the aforementioned “hallucination” risks — whereby LLMs could become a vector for societally damaging disinformation. So large platforms may need to be proactive about how they apply them.

VLOPs/VLOSE will be designated by the Commission later this year, so it remains to be seen which of the larger platforms will fall in-scope of these obligations (but tech giants like Google and Microsoft look like a cert). Platforms will have four months after being designated to comply with the DSA, including producing the necessary risk assessments.

Last year the EU also beefed up its Code of Practice against Disinformation. And while this initiative still isn’t a legal instrument — it remains voluntary self regulation — abiding by the code is being seen as a form of best practice which can count towards DSA compliance, creating an incentive to stick to it. (While, on the flip side, breaches of the DSA can attract fines of up to 6% of global annual turnover so that threat may compel platforms not to stray too far from the bloc’s digital rulebook.)

Among the commitments Code signatories agreed to are to deploy measure to reduce manipulative behaviour used to spread disinformation, such as fake accounts, bot-driven amplification, impersonation, malicious deep fakes — all of which looks relevant to risks created by generative AI hallucinations. They also agreed to provide users with better tools to spot, understand and flag disinformation; beef up transparency of recommender systems; and adapt them to limit the propagation of disinformation, among a number of other commitments.

What about copyright — or wrongs?

We’ve also seen copyright litigation firing up on a number of fronts in relation to generative AI makers’ decisions not to license content to train models — which can sometimes be commanded to produce work in the style of named artists and writers or generate snippets of protected code on demand without the necessary credit (and/or payment).

The theme here is protected content being used without up-front permission/licensing by commercial AI platforms that stand accused of trying to free-ride to profit off of others’ labor. (Although, technically speaking, being a not-for-profit (as OpenAI, for example, used to be) does not provide an escape from any legal requirements to license protected content they may use.)

Examples include the class action suit against Microsoft, GitHub and OpenAI which accuses them of violating copyright law by allowing Github’s code-generating AI, Copilot, to produce licensed code snippets without credit; or the legal case against popular AI art tools, Midjourney and Stability AI, which alleges they infringed on the rights of millions of artists by training their tools on web-scraped images; or the case brought by stock image giant Getty also against Stability AI — for reportedly using millions of images from its site without permission to train its art-generating AI, Stable Diffusion.

Machine-pastiche on demand is obviously not the same thing as human artists being inspired by others to craft harder in their own output. But the question remains how exactly does existing copyright law apply to these new tools — which can also of course be a boon for supporting human creativity and productivity. So while litigation seeks to establish whether/how current laws may be applied, the overarching debate is whether copyright need updating to clarify how to apply it to AI; and to ensure that human creativity and labor is protected in an era of increasingly capable AIs — and, if it does, where exactly should the lines be re-drawn?

Copyright is often a contentious area. But it’s fair to say generative AI has fired new fuel into an oft-raging battleground. Some (like digital rights group the EFF) argue that use of works of art as training data to build generative art AI models which produce output that does not amount to literal copying is fair use (i.e. not a breach of copyright law) — given copyright generally only protects an artist’s creative expression, rather than their ideas (nor even, necessarily, a distinctive style). Others disagree — arguing that use of protected artworks to train AI models without licensing that usage breaks the law.

“All of the data you put into the AI that would technically be use of copyright protected works that would typically be a type of copyright licence,” says Dr Hayleigh Bosher, a senior lecturer in IP law at Brunel University, London. “The same way that Spotify licences all the music that’s on their platform from the rights holders… and then we listen to it on the other end, there’s a licence that goes on in the background. And so my understanding of copyright law currently would be that that still applies to AI. So you would have to licence the use of the input data, based on the current law. But, again, it’s up for debate. Some people would disagree with me and we don’t know for sure because the law has never been applied in that way… Copyright law in particular is specifically designed around the human creative process.”

A key issue is current legal tests for copyright infringement weren’t devised with generative AI mimics in mind.

“The test for copyright infringement very much depends on the creative process of the creator. And the question is about did they take a substantial part? Did they copy, intentionally or unconsciously, the original work and did they take that bit of the human element… what makes the copyright original — the personality, touch that’s been put on the work to make it original. That’s the bit that you can’t copy in copyright for infringement. The problem with AI-created works is that because it’s generating works in a different way to how a human would that test doesn’t really apply because you probably wouldn’t be able to pinpoint exactly what they had copied…. It’s taken a little bit of everything.”

Bosher argues copyright law needs to be updated to clarify how it applies in this unfolding era of generative AI art, music, film and so on. And to safeguard human endeavor — all that inherently human blood, sweat and tears and the societally valuable production that results from it — pointing out copyright law has always evolved to adapt to novel tools that change creative production and cultural consumption.

“Copyright law should always evolve with technology,” she argues. “We got new copyright law when there was a photocopier. We got new copyright law when there was the camera phone… There’s new copyright law when we got the internet. And we get new laws when there’s social media. We used to always update the law to keep up with technology and culture and consumption and creativity. And that’s really normal.

“The thing with AI is that it’s evolving so far that the challenge for the lawyers and the legislators is that they’re already behind schedule. They need to start thinking about this. And they need to speed up their processes. [OpenAI has] already come out with a new version — we’re on version 4 of that chatbot — and still, the policymakers are [debating] should we do something about it?”

Appetite among lawmakers to update copyright laws for the generative AI era is clearly lagging tech developments. Indeed, in the U.K., ministers recently proposed to careen in the opposite direction — by removing hurdles to AI data mining. However the plan was met by outrage from the country’s creative industries and the government appears to have had a rethink. (Although it has also signalled it favors a light touch approach to AI, rather than a bespoke legal framework, so presumably it won’t be rushing to update IP licensing law either.) So it remains to be seen how the copyright vs wrongs issue evolves.

On the flip side in the copyright arena, there is the (also) contested question of whether AI-generated works should be copyrightable themselves“Whether AI can make a work that’s capable of being protected as copyright is currently an unanswerable question in the law,” says Bosher. “It’s just about how you frame your understanding of copyright law. My view is that it shouldn’t be [protected] and even if it is, it shouldn’t be protected to the same level — because it’s not doing the same thing.”

The Writers Guide of America has been one swift mover here — putting out a draft set of rules last month that seeks assurances from major movie studies during a contract renegotiation that AI-generated text can only be used as research material and can’t be covered by IP.  The EFF dubs the move a “smart strategy that zeroes in on the Guild’s domain: protecting its members” — summing up the proposal thusly: “That means that if a studio wants to use an AI-generated script, there can be no credited author and no copyright. In a world where studios jealously guard the rights to their work, that’s a major poison pill. Under this proposal, studios must choose between the upfront cost of paying a writer what they are worth, and the backend cost of not having control over the copyright to the product.”

Where this will all shake out remains to be seen but the economic and cultural value provided by creative industries suggests artists and writers may, ultimately, come away with more protections for their labor than software engineers — a field where there is, conversely, more demand than (humans can) supply, so more of an argument for applying automation to scale productivity. (Coders’ labor may also simply be more easily rechannelled into higher level work atop automated code, than creative human labor that’s forced to abandon making art.)

The whole point of copyright is “the encouragement of learning and creativity and culture”, according to Bosher, who argues simply automation doesn’t align with that mission. “It’s kind of a social contract where we give rights to people who make stuff and then in exchange we get culture and knowledge and learning and it progresses society. There’s actually quite a high service type of law, trying to make the world a better place through creativity and culture. AI creativity is not aligning with that same purpose,” she says.

“For me, it comes down to the fact that if you think about what is the purpose of this AI program that you’ve created? And is it trying to make the world a better place? Is it solving problems? Is it helping society? And maybe some people see it in a short sighted way. They’re like, yeah, this is easier now, I don’t have to write the blog post. But let’s think long term about the kind of world we want to live in. And is that going in a good direction?… I just think we should value the human creator and understand the value that has to society, and I just don’t think AI should be replacing that.”

“Do we need AI to make a new song for us? I don’t think we do. It’s already a saturated market,” she adds. “AI is not solving a problem for us in that situation… We have other problems that need solving in the music industry in particular — we’ve got data issues, we’ve got inclusivity issues. If [someone said] we’ve invented this AI, it’s gonna solve these problems for you, then I’d be like great! Encourage that creativity. But at the minute, especially in the creative industries, AI creativity doesn’t need to be supported.”

Intellectual property is a private right which means it’s up to rights holders to take enforcement action against entities they believe are infringing their protected content — hence why we’re seeing legal requests and litigation firing up on various fronts. And in another example just this week, the Financial Times reported that Universal Music Group has asked streaming platforms including Spotify to block AI services from scraping arrangement and lyrics from their copyrighted songs, with the music giant saying it has a “moral and commercial responsibility” to its artists to try to prevent unauthorised use of their music, and also leaning on “platform partners” to get on the same side of the fight and “prevent their services from being used in ways that harm artists”, as it put it.

Despite a rising number of legal requests and lawsuits by rights holders targeting generative AI tech, Bosher suggests the lack of clarity around how copyright law applies is likely preventing more from taking steps to protect their IP at this point.

Given the risks of losing a test case, many rights holders may be keeping their powder dry and waiting for lawmakers to clarify how the law applies (or, indeed, update it to reboot protections) before moving to litigate.

Of course, the risk is if they wait too long usage of the technology might become embedded as a mainstream utility and lawsuits that seek to unpick that may be met with public outrage, heaping reputational damage on the rights holders attempting to slay popular tools. So the onus really is on policymakers to get a handle on a thorny issue and figure out a fair way forward.

“For someone to bring a test case they have to be in a very financially secure situation where they’re willing to risk losing, and you’re not just going to lose the money in the case if you lose, you’re also going to lose your business model,” she notes. “So we need clarity. And also we need some decisions to be made… Are we going to protect AI created works? If so, what does that look like? What’s the scope? And how are we going to protect?”

You say hallucination, AI say defamation/disinformation!

Litigation could also fall on generative AI makers via claims brought under existing libel laws.

Reuters recently reported that OpenAI could be facing a defamation suit in Australia over false claims ChatGPT generated about a regional mayor. The news agency said the technology had falsely stated that the elected mayor of Hepburn Shire, Brian Hood, had served time in prison for bribery.

We’ve dug into this topic before — and, as with copyright, the first-order legal issue is a lack of clarity around how the law applies to automated defamatory statements. So it may well take a test case like Hood’s (if indeed he brings one) to start setting some precedents.

That said, given the speed with which generative AI is being embedded into our systems and workflows, lawmakers may be forced to grapple with delineating liability for generative AI falsehoods sooner rather than later — before these tools scale unaccountable fake news out of control.

The production of fake details about named individuals is not unusual where generative AI (or even more basic forms of automation) is concerned. It appears to occur pretty regularly with LLMs, a technology that’s prone to filling in gaps in its training data by making stuff up. To the point where OpenAI and other model makers have a word for it — referring to these factual failures by AI chatbots as “hallucinations”.

It’s less clear whether this truth flaw essentially comes baked in to general purpose AIs (which can’t literally know everything but can be asked anything) — or whether developers of these models will, at some point, figure out far more effective systems for preventing the outputting of statements that are at best wrong and might also risk being viewed as defamation.

In the release of GPT-4, OpenAI claimed the model “significantly reduces hallucinations relative to previous models”, which it also said had “been improving with each iteration” — claiming it had scored 40% higher on its topic-based internal evaluations than GPT-3. But OpenAI also conceded GPT-4 is “still is not fully reliable” — confirming that “it [still] ‘hallucinates’ facts and makes reasoning errors”.

In our experience using ChatGPT, at times the company seems to have experimented with shifting its ‘safety sliders’ for this purpose. For example by dialling up the likelihood that the chatbot will refuse to respond to queries asking it for information about named individuals — which is one hard-stop way to prevent it spewing outrageous falsehoods.

Evidently, though, the raison d’etre of generative AI is to be useful. Which demands it has the capacity to respond to questions it’s asked. Which means that considerations about dialling up safety checks to curb “hallucinations” are in clear tension with the core USP claimed for ChatGPT, as a universal helpmate.

Moreover, the more powerful these models get, the more potential risks emerge as a consequence of the expanded field of content they’re capable of producing — as OpenAI acknowledges in its blog post, writing: “GPT-4 poses similar risks as previous models, such as generating harmful advice, buggy code, or inaccurate information. However, the additional capabilities of GPT-4 lead to new risk surfaces”. So, clearly, tackling hallucinations is not a simple problem and the notion of a complete fix might be wishful thinking.

In the meanwhile, reports of ChatGPT’s capacity to compose plausible-sounding but erroneous personal histories — not always obviously defamatory, given an apparent tendency to sometimes be biased towards more conventional success paths (but the content is still wrong) — continue to surface, suggesting OpenAI remains very far from getting on top of fast-scaling AI-generated lies.

In other instances the chatbot has dreamt up the titles of fake books and newspaper articles that were not, in fact, written by the named individual the technology claimed. Or made up citations of (fictional) research papers — even sometimes attributing them to real people who didn’t write them either. Any one of which could wreak reputational damage, depending on what citations and references the bot is hallucinating into existence this time.

On top of this, there is the overarching issue that such confident sounding lies also work to blur the line between reality and fantasy which may present major, collective risks for human civilization — atop of the individual harms.

One thing is clear: “Hallucination” remains a big problem for developers of generative AI in spite of the misleadingly superficial-sounding label they’ve sought to badge the issue with.

Notably they did not pick the word ‘disinformation’ — a term that is widely discussed by policymakers in the context of social media content moderation, alongside other toxic issues like hate speech (if not typically as tightly regulated). But if you reframe AI hallucination as AI-generated disinformation it’s easier to see what type of regulations might apply to such content.

As noted above, the EU has its Code of Practice against Online Disinformation, for example — whose signatories include OpenAI investor Microsoft; and Google, maker of the ChatGPT rival Bard, to name two of the several tech giants who have committed to applying measures to combat the spread of online falsities.

One perhaps surprising additional revelation that’s emerged in recent weeks is that the GDPR is set up to regulate a sub-set of AI-generated disinformation — at least per the Italian DPA’s interpretation of it — given it’s ordered OpenAI to provide tools so people can ask for corrections to false statements ChatGPT generates about them. And if the company can’t technically correct the AI-generated disinformation it’s been told it will have to delete the person’s data entirely, in a bid to prevent recurring falsities about named individuals.

If this interpretation of the GDPR is sustained, the implication is that scores of EU citizens could request removal of their data from learning algorithms — reshaping what generative AIs are able to output by exerting existing long-standing rights to privacy and protection against personal disinformation. So regulating hallucinating AIs is already shaping up to be a major battleground.