top of page

Legacy Contract Checklist

Legacy Contract Checklist

Many SaaS contracts signed between 2018 (the GDPR/CCPA inflection point, when privacy and security clauses entered the contracting mainstream) and today’s AI-feature wave were negotiated for a different technical reality.

Agreements were negotiated for a world in which vendors hosted software, stored customer data, and improved products through ordinary debugging and feature refinement. Today, those same platforms may include AI-enabled functionality that changes what familiar clauses mean in practice.

This guide (download below) examines common SaaS terms from the GDPR/CCPA era, and reframes them for the AI era.

 

Each section identifies sleeper clauses to spot, shows how the “before” and “after” meanings can diverge, and offers questions to help you assess whether legacy contract language still fits your organization’s current use of AI-enabled products.

Improve the Services clause

Customer Data may be used to “improve the services.” This language may now be broad enough to support AI training, tuning, or development of new capabilities. Same words, very different implications. What to look for: Vendor may use Customer Data to “improve,” “enhance,” or “develop” the services. Before: Usually meant debugging, performance tuning, and feature refinement. After: May now support model training, tuning, dataset development, and cross-customer learning. Questions to ask now: Does “improve” include training or tuning AI models? Is use limited to support and service delivery, or does it extend to product development?

Definition of Customer Data

If Customer Data is defined too narrowly, newer AI-related data types may fall outside the contract’s main protections. What to look for: An older or narrow definition of “Customer Data.” Before: Often focused on uploaded files, records, and account information. After: May fail to capture prompts, outputs, annotations, corrections, logs, or other AI-related interaction data. Questions to ask now: Are prompts and outputs expressly included? Are logs, metadata, and user corrections covered by the same protections?

Usage Data, Telemetry, and Analytics

Telemetry clauses that once seemed operational may now reach some of the most valuable behavioral data in an AI-enabled product. What to look for: Vendor rights to collect usage data, analytics, telemetry, or statistical information. Before: Usually meant service monitoring, uptime, and product analytics. After: May include prompts, outputs, user behavior, and corrections that are highly valuable for AI optimization. Questions to ask now: What exactly is included in telemetry? Are prompts, outputs, and correction data excluded from broader reuse?

De-Identified or Aggregated Data

A de-identified data clause may sound low-risk, but in an AI context it can support training, commercialization, and new inference risk. What to look for: Vendor may use de-identified, anonymized, or aggregated data without restriction. Before: Often accepted as low-risk benchmarking or reporting language. After: May support training datasets, commercialization, or uses that create re-identification and inference risk. Questions to ask now: What standard of de-identification applies? Can de-identified data be used for model training or product development? Does the definition of de-identification align with your regulatory environment? Terms such as anonymization, pseudonymization, and de-identification have different impact across various regulatory regimes (GDPR, CCPA/CPRA, HIPAA).

Feedback Clauses

A standard feedback clause may now sweep in prompt refinements, workflow explanations, and corrections that are highly valuable to AI vendors. What to look for: Broad vendor rights to use “feedback,” “suggestions,” or “ideas.” Before: Usually understood as ordinary comments on product usability. After: May be read to include prompt refinements, workflow explanations, annotations, or corrections to AI outputs. Questions to ask now: How is “feedback” defined? Does it exclude prompts, outputs, confidential information, and customer workflows?

Confidentiality Provisions

Traditional confidentiality language may not clearly answer the AI question that matters most: can the vendor use your confidential information to train or tune its systems? What to look for: Standard confidentiality language with no AI-specific use restrictions. Before: Designed to prevent disclosure and misuse in conventional SaaS operations. After: May not clearly prevent confidential inputs from being used to train, tune, or shape AI systems. Questions to ask now: Does confidentiality prohibit training or tuning on confidential information? Does it address outputs that may reflect customer inputs?

License Grants

A broad license to use and modify Customer Data may allow more data use than intended, once the service includes AI functionality. What to look for: Broad licenses allowing vendor to use, copy, modify, or create works from Customer Data. Before: Usually narrowly scoped to allow data use only as necessary for hosting and operating the service. After: May be cited as authority for reuse, transformation, or long-term retention in AI-enabled environments. Questions to ask now: Is the license limited to providing and supporting the service? Does it expressly exclude model training, derivative development, and commercialization?

Data Security

Security clauses from the pre-AI SaaS era may not reflect the larger data stores and expanded attack surface of AI-enabled products. What to look for: Legacy security language that assumes a relatively contained SaaS environment, rather than a product that may involve multiple AI services, subprocessors, or external model layers. No clear commitments around data minimization, retention limits, segmentation, access controls for expanded data stores, or vendor oversight of downstream AI infrastructure. Before: Focus was on unauthorized access and breach risk, against a limited set of customer records in a relatively stable hosted software environment. After: AI-enabled products may create broader data collection, more copies of sensitive information, more system interconnections, and more places where data can be stored, processed, exposed, or attacked. Questions to ask now: Does the clause reflect the product’s current AI architecture? What new data types are now stored or exposed? Has the vendor added new subprocessors or model-layer dependencies? Are retention, access, and segmentation controls adequate for the expanded data environment?

Indemnification

Traditional indemnities were not drafted with output-related claims, training data disputes, or AI provenance issues in mind. What to look for: Traditional IP-only and third-party claim allocation. Before: Drafted for conventional software risk. After: May not clearly cover output-related IP claims, training data provenance issues, or embedded third-party content. Questions to ask now: Does indemnity cover AI-generated outputs? Who bears the risk if training data or generated content creates third-party claims?

Termination and Deletion

A deletion clause may remove the source data, without addressing whether the vendor has already extracted lasting value from it. What to look for: Vendor will return or delete Customer Data on termination. Before: Focus was on active stored data. After: May not address whether data has already influenced models, embeddings, derived datasets, or tuning artifacts. Questions to ask now: What happens to data already used in training or tuning? Does deletion extend to derived AI artifacts or only source data? What to look for: Vendor will return or delete Customer Data on termination. Before: Focus was on active stored data. After: May not address whether data has already influenced models, embeddings, derived datasets, or tuning artifacts. Questions to ask now: What happens to data already used in training or tuning? Does deletion extend to derived AI artifacts or only source data?

Audit Rights and Transparency

Conventional audit rights may sound reassuring while offering very little visibility into actual AI data practices. What to look for: Conventional audit rights with limited operational visibility. Before: Often enough for privacy and security verification. After: May not provide meaningful visibility into training practices, data lineage, or downstream AI providers. Questions to ask now: Can the customer verify whether its data is used in training or testing? What transparency rights exist around AI functionality and data flows?

Subprocessors and AI Supply Chain

A routine subprocessor clause may now cover a much broader AI ecosystem than the customer expects. What to look for: Routine subprocessor language. Before: Usually referred to hosting, storage, or support vendors. After: May now include model providers, API layers, and other downstream AI infrastructure. Questions to ask now: Who are the relevant downstream AI providers? Do the same data use restrictions flow through the vendor’s AI stack? Will the customer be notified of changes in underlying model providers?

Data Processing Addendum (or Missing DPA)

A legacy SaaS relationship may have no DPA at all, or a DPA that no longer matches current AI-enabled data flows, reuse practices, or inference risk. What to look for: No DPA, an outdated DPA, or a vendor-form DPA that does not address current AI functionality, newer categories of data, evolving subprocessors, or vendor-side internal use. Before: Where a DPA existed, it was often built for ordinary hosted software: storage, access, support, security controls, and a relatively stable processing chain. After: AI-enabled services may involve prompts, outputs, logs, annotations, user corrections, model providers, broader internal use, and data flows that create inference risk even where the vendor is not disclosing raw customer data. An older DPA may say little or nothing about these practices. Questions to ask now: Is there a DPA at all, and does it govern the current service as actually used? Does it cover prompts, outputs, logs, and other AI-related interaction data? Does it address internal use, subprocessor changes, retention, and model-related data flows? Does it account for inference risk, including the possibility that patterns, profiles, or sensitive conclusions may be derived from customer data even without direct disclosure of the source data?

Derivative Works / Derived Data

Derived data language may be where a vendor claims the downstream value created from your prompts, outputs, and usage signals. What to look for: Vendor rights to create derivative works, derived data, or derived analytics. Before: Often read as ordinary internal reporting or analysis language. After: May be invoked to justify retention of downstream value extracted from prompts, outputs, feedback, or usage data, including trained model assets. Questions to ask now: Can the vendor create derivative models or datasets from customer interactions? Does the contract clearly distinguish service delivery from value extraction?

What to do next

Start with the contracts that sit closest to sensitive data, important workflows, or newly activated AI features, not necessarily the highest-spend vendors.

A targeted AI Addendum can be an efficient way to begin the conversation with critical vendors about training, reuse, retention, and other AI-era issues that older SaaS terms may not clearly address.

Need a starting point? I help legal teams identify priority contracts for AI review and draft practical AI Addenda.

bottom of page