Understanding IP Challenges in AI Due Diligence

Navigating IP in AI Due Diligence

The evolving landscape of artificial intelligence (AI) necessitates a specialized approach to due diligence, particularly regarding intellectual property (IP). Unlike traditional software solutions, AI models are dynamic, often integrating third-party data and relying on ongoing training mechanisms. This complexity demands a nuanced understanding of ownership structures and licensing rights.

Understanding Ownership Structures

A fundamental step in the due diligence process is determining the ownership structure of the AI solution, which encompasses its software, models, and training data. This assessment is critical to mitigating potential legal risks and ensuring compliance.

Key Questions for IP Title Assessment

Has the vendor developed the software and model in-house or commissioned a third party?
Does the software or model incorporate, or is it built using, third-party code, foundation models, or other third-party intellectual property?
Does the algorithm, model, or any component leverage open-source components?

Vendors should provide comprehensive responses to confirm ownership or licensed rights over critical components, which will impact the structure of the contractual relationship.

Licensing Considerations

In scenarios where the AI solution is licensed, it is essential to clarify:

Does the vendor have the ability to grant a license with a right to sublicense the solution to customers?
Does such right of sublicense meet customers’ intended purposes?
Does the license allow customization of the solution for the customer?

These responses may influence the contract’s structure, especially in software-as-a-service (SaaS) models where end customers access the solution directly from the vendor’s platform.

Key IP Considerations on Training Data

For AI solutions that have been pretrained, a thorough due diligence process should assess:

The type of data used for pre-training purposes.
The sources and methods of obtaining training data, such as vendor-owned data, third-party licensed data, open datasets, or data obtained through web scraping.

Special attention should be given to data obtained through web scraping, as copyright exceptions vary by jurisdiction. Vendors must implement robust procedures to mitigate infringement risks during the learning process.

Managing Input and Output Data

The rights over input data must be assessed, considering whether it is customer-owned or third-party data. Additionally, it is vital to understand vendor policies and procedures for handling input data:

How the vendor processes and stores input data, including segregation from vendor and third-party data.
The vendor’s retention and destruction policies for input data.

Expectations regarding the vendor’s use of input data should also be clarified:

Is the data used solely for the customer’s purposes and benefit?
Can the vendor use input data to improve the AI tool for other users?
Are there any other permissible uses?

Regarding output data, it is essential to determine whether the data produced is newly generated content or responses based on predefined rules, such as predictions or recommendations. Understanding IP rights in the output data is crucial, especially if the data will be used or distributed outside the customer’s home jurisdiction.

Business Continuity: Addressing Vendor Insolvency Risks

The ongoing maintenance and support of the AI solution pose a risk of vendor insolvency. To mitigate this, due diligence should explore whether an escrow agreement is in place, ensuring continued access in the event of vendor failure. This agreement should specify:

When and how customers can access the source code and relevant documentation.
Which usage rights are then granted.
Who bears the cost of maintaining access and updating the solution.

Conclusion

The due diligence process aims to map pre-existing and newly created IP rights through the AI solution, ensuring a clear understanding of:

Who owns what.
What the respective scope of rights of the parties concerned is regarding the solution and the data generated through its use.

Ultimately, the goal is to mitigate risks of IP infringement while ensuring contractual consistency between rights obtained up-front and those licensed to users, as well as between rights generated downstream and their possible upstream use.