Can AI Be Used to Train Models on Copyrighted Works Without Permission?

The intersection of artificial intelligence and copyright law presents a fascinating legal landscape for Australian organisations. As AI systems continue to advance, many developers train their models on vast datasets that may include copyrighted materials. But is this practice lawful in Australia? Actuate IP Melbourne notes that this question has significant implications for tech companies, content creators, and research institutions alike.

In this article

Key Takeaways

AI training that creates copies of copyrighted works may infringe reproduction rights under Australian law
Fair dealing exceptions offer limited protection for AI training activities

Practical risk management strategies include licensing, using permissible content, and maintaining thorough documentation
International cases are influencing the evolving Australian legal landscape
Organisations should implement a structured approach to copyright compliance for AI training

What “Training a Model” Means for Copyright

Data Ingestion and Copies Made During Training

When an AI model is trained, various types of copies are created throughout the process. These include cached files of original works, tokenised representations that transform content into numerical values, and model checkpoints that preserve the state of training. Each of these copies potentially triggers copyright concerns under Australian law.

Output Risks: Verbatim Reproduction vs Transformed Output

The outputs generated by AI models present varying levels of legal risk. When a model produces content that closely resembles or directly reproduces copyrighted material, this creates clear legal exposure. On the other hand, highly transformed outputs that draw on statistical patterns rather than specific works may present lower – but not zero – legal risk.

Distinction Between Copying and Learning Statistical Patterns

Courts may distinguish between literal copying of works and the process of learning abstract patterns from those works. This nuanced distinction could prove critical in future cases, as learning patterns may be viewed differently from expressive copying that retains the creative elements of original works.

Australian Legal Framework Relevant to Model Training

Copyright Act 1968 — Core Rights Implicated

The Australian Copyright Act 1968 grants several exclusive rights to copyright holders that AI training activities may infringe. These include reproduction rights (when copies are made during training), adaptation rights (when works are transformed into different formats), and communication rights (when works are made available to others through model outputs).

Fair Dealing Exceptions and Their Limits

Australia’s fair dealing exceptions permit limited use of copyrighted works for specific purposes such as research, study, criticism, review, and reporting news. However, these exceptions are narrower than the “fair use” doctrine in the US and may not cover commercial AI training at scale. The 2017 introduction of a quotation exception offers some additional flexibility, but it was not designed with AI systems in mind.

“The application of fair dealing exceptions to AI training presents one of the most challenging questions in Australian copyright law today. The boundaries remain untested in court, creating significant uncertainty for technology companies.” – Actuate IP

Moral Rights and Contract-Based Restrictions

Beyond copyright infringement, AI training may implicate moral rights of attribution and integrity. Additionally, many digital works come with contractual terms of service that explicitly prohibit automated processing or scraping – creating parallel legal risks even where copyright exceptions might apply.

Rights That Are Not Available in Australia

Unlike the European Union, Australia does not have a sui generis database right that protects collections of data independent of copyright. This means that some data compilations that might be protected in Europe could potentially be used more freely in Australia, provided they don’t contain individual copyrighted works.

International Cases and Regulatory Activity

Major US and EU Cases With Persuasive Impact

While not binding in Australia, international cases such as the Google Books decision in the US have shaped thinking about mass digitisation. More recent litigation involving AI companies like Stability AI, Midjourney, and OpenAI provides insight into how courts may approach AI training issues. Australian courts often look to these precedents when addressing novel copyright questions.

Policy Developments Abroad

Regulatory approaches in the EU, UK, and US are evolving rapidly, with text and data mining exceptions being introduced in some jurisdictions. These developments may influence Australian policymakers and guide organisations operating across multiple territories.

Assessing Infringement Risk

Factors That Increase Legal Exposure

Several factors can raise the legal risk profile of AI training activities, including:

High volume or proportion of copyrighted material in training data
Training on commercially valuable or recent works
Outputs that closely resemble specific source materials
Lack of attribution or transparent data provenance

Evidence Courts Will Seek

In potential litigation, courts will likely examine training logs, dataset inventories, sample outputs, and technical documentation about model architecture. Organisations should maintain these records with the expectation that they may need to be produced in legal proceedings.

Possible Remedies and Enforcement

If infringement is established, remedies may include injunctions to cease use of trained models, damages based on lost licensing revenue, and orders to delete datasets or models derived from infringing materials.

Practical Compliance Strategies

Licensing and Permissions

Obtaining appropriate licenses for training data represents the safest approach. This may include direct licensing from copyright holders, blanket agreements with publishers or collecting societies, or participation in emerging AI training licensing schemes.

Preferencing Permissible Content Sources

Focusing on public domain materials, content under permissive Creative Commons licenses (particularly CC0 or CC-BY), and datasets with explicit AI training permissions can substantially reduce legal exposure.

Data Governance and Documentation

Maintaining comprehensive records of dataset composition, provenance, and rights status is both a legal safeguard and operational best practice. These records should include what content was used, its source, applicable licenses, and how it was processed.

Technical Measures to Reduce Risk

Implementing technical controls such as content filters to exclude high-risk materials, data anonymisation techniques, and output monitoring systems can help manage legal exposure throughout the AI lifecycle.

Contract Terms With Partners

When working with AI vendors or partners, clear contractual terms should address copyright compliance, including warranties about training data, indemnification provisions, and responsibilities for handling third-party claims.

Responding to Claims

Immediate Operational Steps

Upon receiving a copyright claim, organisations should preserve relevant records, consider temporarily restricting access to challenged models or outputs, and document all steps taken in response.

Legal Defence Strategies

Potential defences may include fair dealing arguments, demonstrating substantial transformation of source materials, or negotiating retrospective licenses where appropriate.

Stakeholder Communication

Maintaining transparent communication with affected parties can help resolve disputes before they escalate to litigation. This includes establishing clear channels for copyright holders to raise concerns.

Practical Checklist for Legal Compliance

Before Training

Before beginning AI training:

Conduct a rights clearance assessment for proposed training materials
Document the legal basis for using each dataset component
Implement technical measures to track data provenance
Consider consulting with copyright specialists for high-risk projects

During Training

While training is underway, maintain detailed logs of all data used, processing steps applied, and model configurations to demonstrate responsible practices if questioned later.

After Deployment

Post-deployment, regularly audit model outputs for potential copyright issues, implement a takedown procedure for problematic content, and stay current with evolving case law and regulations.

Policy Outlook for Australia

Areas to Watch for Reform

Australia may follow other jurisdictions in introducing specific exceptions for text and data mining or AI training. The government has indicated interest in reviewing copyright law to address emerging technologies.

Industry Standards Development

Cross-industry agreements and voluntary standards for responsible AI training are emerging globally and may provide practical guidance even before formal legal changes occur in Australia.

Conclusion

The legal status of training AI models on copyrighted works in Australia remains in flux, with significant grey areas that create both opportunities and risks for organisations. A thoughtful approach combining legal, technical, and operational measures can help manage these risks while still enabling innovation. As this area continues to evolve, staying informed about legal developments and implementing robust compliance practices will be essential for anyone working with AI systems. For specialised guidance on navigating the complex intersection of AI and copyright law in Australia, Actuate IP can provide targeted advice based on your specific circumstances and risk profile.