General

The ClauseBase platform makes use of Large Language Models (LLMs) in many different contexts, which are described on the next page.

Service provider used: Microsoft (Azure) by default

ClauseBase currently uses GPT-4o as the default LLM, hosted by Microsoft Azure, except if:

the customer would opt for any optional alternative LLM vendor offered by ClauseBase, such as OpenAI, Noxtua, Mistral or Anthropic (Claude)
a customer would use his own key for any of the vendors

Microsoft is not in the business of developing of GPT, so can guarantee in very strong language that it does not reuse any customer data for improving either GPT or any of its own products, and does not share customer data with OpenAI. These confidentiality guarantees are similar to how Microsoft will never reuse a customer’s DOCX-files that are stored in its Office 365 cloud.

Moreover, since mid March 2025, ClauseBase was granted the abuse monitoring exception by Microsoft, meaning that Microsoft will not store any content submitted to the LLMs, not even for "abuse monitoring" (e.g., to check whether users are looking for tips to hide a body in their garden, or request recipes to build bombs).

Xayn offers very similar guarantees for its Noxtua LLM, which should not be surprising for an LLM created by, hosted in and targeted at Europe.

Similar guarantees — although less explicit than Microsoft and Xayn — are made by the optional vendors that the customer can choose:

LLM server locations

We currently make use of the following Microsoft Azure servers for GPT:

Sweden, for enterprise customers in Europe
France, for users in the United Kingdom, and for our non-enterprise customers in countries in Europe other than for Belgium, Germany & Switzerland
United States East, for our enterprise customers in the United States
Australia East, for enterprise customers in Australia

In Azure, we use the "Data Zone Standard" setting, so that it may happen that within the same big geographical zone, the server location gets dynamically rerouted, based on traffic consumption at Microsoft (so that, for example, the Swedish server gets used by users that would normally use France, in case of heavy traffic). To be clear: as explained by Microsoft, this setting doesn't allow data to go across boundaries of the large region, e.g. data from within Europe will remain in Europe, and data within the US will remain in the US.

Noxtua

Instead of Azure GPT4o, yYou can also opt for Noxtua Legal AI, Europe’s first sovereign Legal AI. Trained with legal texts labeled by experts, this sovereign European AI is specifically tailored to your needs as a law professional. This makes Noxtua your secure and independent European AI alternative.

Noxtua is operated by Xayn ,which hosts Noxtua on the Open Telekom Cloud in Germany.

Optional OpenAI

While Microsoft's version is technically almost identical to the one used by OpenAI, by default the ClauseBase platform does not make any use of OpenAI's services due to the poor confidentiality track record of OpenAI and our legal audience's concerns with respect to confidentiality.

However, customers can optionally choose for OpenAI, in addition to (or as a replacement for) Azure GPT4o. We offer finegrained controls that allow customers to choose which LLMs can be selected by their users, even mixing different modules and/or user profiles (e.g., "partners can use any LLM for any drafting task; senior lawyers can choose between Claude and Azure GPT4o; junior lawyers are only allowed to use Mistral, except for drafting new clausese").

For all its cleverness, it is indeed remarkable how OpenAI has completely dropped the marketing ball for this confidentiality issue. OpenAI is indeed learning from the input of its users, similar to how Google learns which answers are preferred by users, by checking whether users reformulate their query or keep clicking on different hyperlinks in order to find their answer.

OpenAI limits the learning process to the end-user versions of ChatGPT. Initially, OpenAI also learned from user queries submitted through its API (i.e., when third party developers incorporate the LLM in their own products), but the company claims to have stopped this practice since the end of March.

Even though end-users can opt-out from the learning process through a simple software setting, and even though it no longer applies at all to the API-use, the damage was done. For most legal experts, the reputation of OpenAI is burned, and — disregarding the quality problem of the hallucinations — many law firms downright refuse to allow any use of LLMs at all, because of the fear that client confidential data would be exposed in the next version of the LLM.

While OpenAI has obviously done this to themselves, the fear in the legal community is ungrounded. LLMs learn from billions of data points, so that adding a particular fragment of client information is like adding a single drop of water to an ocean of existing information. The likelihood that a client’s information is exposed through other means (gossip, data breaches, lost laptop, …) is likely much higher. Moreover, as explained below, LLMs do not literally store text, so the analogy is probably more “a vague drop of water” into an existing ocean.

However, customers can optionally use their own OpenAI subscription, if they wish — see below.

Customer's own subscription

Customers are free to use their own key (subscription) for GPT 4 Turbo / GPT4o in Azure or OpenAI. In fact, ClauseBase actively encourages customers to do this, to get better capacity and direct invoicing from Microsoft/OpenAI.

As stated above, ClauseBase also offers the possibility to customers to use other vendors than Microsoft, e.g. Meta's LLama (hosting possible in many locations), Xayn Noxtua, Anthropic Claude (US) or Mistral Large (France).

Updates

19 March 2025: added the Microsoft abuse monitoring exception
2 April 2025: added explicit references for Noxtua

PreviousGeographical locations NextNo training / finetuning

Last updated 3 months ago