The intersection of artificial intelligence and copyright law presents a fascinating legal landscape for Australian organisations. As AI systems continue to advance, many developers train their models on vast datasets that may include copyrighted materials. But is this practice lawful in Australia? Actuate IP Melbourne notes that this question has significant implications for tech companies, content creators, and research institutions alike.
Key Takeaways
- AI training that creates copies of copyrighted works may infringe reproduction rights under Australian law
- Fair dealing exceptions offer limited protection for AI training activities
- Practical risk management strategies include licensing, using permissible content, and maintaining thorough documentation
- International cases are influencing the evolving Australian legal landscape
- Organisations should implement a structured approach to copyright compliance for AI training
What “Training a Model” Means for Copyright
Data Ingestion and Copies Made During Training
When an AI model is trained, various types of copies are created throughout the process. These include cached files of original works, tokenised representations that transform content into numerical values, and model checkpoints that preserve the state of training. Each of these copies potentially triggers copyright concerns under Australian law.
Output Risks: Verbatim Reproduction vs Transformed Output
The outputs generated by AI models present varying levels of legal risk. When a model produces content that closely resembles or directly reproduces copyrighted material, this creates clear legal exposure. On the other hand, highly transformed outputs that draw on statistical patterns rather than specific works may present lower – but not zero – legal risk.
Distinction Between Copying and Learning Statistical Patterns
Courts may distinguish between literal copying of works and the process of learning abstract patterns from those works. This nuanced distinction could prove critical in future cases, as learning patterns may be viewed differently from expressive copying that retains the creative elements of original works.
Australian Legal Framework Relevant to Model Training
Copyright Act 1968 — Core Rights Implicated
The Australian Copyright Act 1968 grants several exclusive rights to copyright holders that AI training activities may infringe. These include reproduction rights (when copies are made during training), adaptation rights (when works are transformed into different formats), and communication rights (when works are made available to others through model outputs).
Fair Dealing Exceptions and Their Limits
Australia’s fair dealing exceptions permit limited use of copyrighted works for specific purposes such as research, study, criticism, review, and reporting news. However, these exceptions are narrower than the “fair use” doctrine in the US and may not cover commercial AI training at scale. The 2017 introduction of a quotation exception offers some additional flexibility, but it was not designed with AI systems in mind.
“The application of fair dealing exceptions to AI training presents one of the most challenging questions in Australian copyright law today. The boundaries remain untested in court, creating significant uncertainty for technology companies.” – Actuate IP
Moral Rights and Contract-Based Restrictions
Beyond copyright infringement, AI training may implicate moral rights of attribution and integrity. Additionally, many digital works come with contractual terms of service that explicitly prohibit automated processing or scraping – creating parallel legal risks even where copyright exceptions might apply.
Rights That Are Not Available in Australia
Unlike the European Union, Australia does not have a sui generis database right that protects collections of data independent of copyright. This means that some data compilations that might be protected in Europe could potentially be used more freely in Australia, provided they don’t contain individual copyrighted works.
International Cases and Regulatory Activity
Major US and EU Cases With Persuasive Impact
While not binding in Australia, international cases such as the Google Books decision in the US have shaped thinking about mass digitisation. More recent litigation involving AI companies like Stability AI, Midjourney, and OpenAI provides insight into how courts may approach AI training issues. Australian courts often look to these precedents when addressing novel copyright questions.
Policy Developments Abroad
Regulatory approaches in the EU, UK, and US are evolving rapidly, with text and data mining exceptions being introduced in some jurisdictions. These developments may influence Australian policymakers and guide organisations operating across multiple territories.
Assessing Infringement Risk
Factors That Increase Legal Exposure
Several factors can raise the legal risk profile of AI training activities, including:
- High volume or proportion of copyrighted material in training data
- Training on commercially valuable or recent works
- Outputs that closely resemble specific source materials
- Lack of attribution or transparent data provenance
Evidence Courts Will Seek
In potential litigation, courts will likely examine training logs, dataset inventories, sample outputs, and technical documentation about model architecture. Organisations should maintain these records with the expectation that they may need to be produced in legal proceedings.
Possible Remedies and Enforcement
If infringement is established, remedies may include injunctions to cease use of trained models, damages based on lost licensing revenue, and orders to delete datasets or models derived from infringing materials.
Practical Compliance Strategies
Licensing and Permissions
Obtaining appropriate licenses for training data represents the safest approach. This may include direct licensing from copyright holders, blanket agreements with publishers or collecting societies, or participation in emerging AI training licensing schemes.
Preferencing Permissible Content Sources
Focusing on public domain materials, content under permissive Creative Commons licenses (particularly CC0 or CC-BY), and datasets with explicit AI training permissions can substantially reduce legal exposure.
Data Governance and Documentation
Maintaining comprehensive records of dataset composition, provenance, and rights status is both a legal safeguard and operational best practice. These records should include what content was used, its source, applicable licenses, and how it was processed.
Technical Measures to Reduce Risk
Implementing technical controls such as content filters to exclude high-risk materials, data anonymisation techniques, and output monitoring systems can help manage legal exposure throughout the AI lifecycle.
Contract Terms With Partners
When working with AI vendors or partners, clear contractual terms should address copyright compliance, including warranties about training data, indemnification provisions, and responsibilities for handling third-party claims.
Responding to Claims
Immediate Operational Steps
Upon receiving a copyright claim, organisations should preserve relevant records, consider temporarily restricting access to challenged models or outputs, and document all steps taken in response.
Legal Defence Strategies
Potential defences may include fair dealing arguments, demonstrating substantial transformation of source materials, or negotiating retrospective licenses where appropriate.
Stakeholder Communication
Maintaining transparent communication with affected parties can help resolve disputes before they escalate to litigation. This includes establishing clear channels for copyright holders to raise concerns.
Practical Checklist for Legal Compliance
Before Training
Before beginning AI training:
- Conduct a rights clearance assessment for proposed training materials
- Document the legal basis for using each dataset component
- Implement technical measures to track data provenance
- Consider consulting with copyright specialists for high-risk projects
During Training
While training is underway, maintain detailed logs of all data used, processing steps applied, and model configurations to demonstrate responsible practices if questioned later.
After Deployment
Post-deployment, regularly audit model outputs for potential copyright issues, implement a takedown procedure for problematic content, and stay current with evolving case law and regulations.
Policy Outlook for Australia
Areas to Watch for Reform
Australia may follow other jurisdictions in introducing specific exceptions for text and data mining or AI training. The government has indicated interest in reviewing copyright law to address emerging technologies.
Industry Standards Development
Cross-industry agreements and voluntary standards for responsible AI training are emerging globally and may provide practical guidance even before formal legal changes occur in Australia.
Conclusion
The legal status of training AI models on copyrighted works in Australia remains in flux, with significant grey areas that create both opportunities and risks for organisations. A thoughtful approach combining legal, technical, and operational measures can help manage these risks while still enabling innovation. As this area continues to evolve, staying informed about legal developments and implementing robust compliance practices will be essential for anyone working with AI systems. For specialised guidance on navigating the complex intersection of AI and copyright law in Australia, Actuate IP can provide targeted advice based on your specific circumstances and risk profile.




