General

The ClauseBase platform makes use of Large Language Models (LLMs) in many different contexts, which are described on the next page.

Service provider used: Microsoft (Azure)

ClauseBase currently uses GPT-4 Turbo, hosted by Microsoft Azure, except if a customer would use his own key (OpenAI or Microsoft).

Microsoft is not in the business of developing of GPT, so can guarantee in very strong language that it does not reuse any customer data for improving either GPT or any of its own products, and does not share customer data with OpenAI. These confidentiality guarantees are similar to how Microsoft will never reuse a customer’s DOCX-files that are stored in its Office 365 cloud.

LLM server locations

We currently make use of the following Microsoft Azure servers for GPT:

  • France, for enterprise customers in Europe

  • United Kingdom, for users in the United Kingdom, and for our non-enterprise customers in countries in Europe other than for Belgium, Germany & Switzerland

  • United States East, for our enterprise customers in the United States

  • Australia East, for enterprise customers in Australia

No OpenAI

While Microsoft's version is technically almost identical to the one used by OpenAI, the ClauseBase platform does not make any use of OpenAI's services due to the poor confidentiality track record of OpenAI and our legal audience's concerns with respect to confidentiality.

For all its cleverness, it is indeed remarkable how OpenAI has completely dropped the marketing ball for this confidentiality issue. OpenAI is indeed learning from the input of its users, similar to how Google learns which answers are preferred by users, by checking whether users reformulate their query or keep clicking on different hyperlinks in order to find their answer.

OpenAI limits the learning process to the end-user versions of ChatGPT. Initially, OpenAI also learned from user queries submitted through its API (i.e., when third party developers incorporate the LLM in their own products), but the company claims to have stopped this practice since the end of March.

Even though end-users can opt-out from the learning process through a simple software setting, and even though it no longer applies at all to the API-use, the damage was done. For most legal experts, the reputation of OpenAI is burned, and — disregarding the quality problem of the hallucinations — many law firms downright refuse to allow any use of LLMs at all, because of the fear that client confidential data would be exposed in the next version of the LLM.

While OpenAI has obviously done this to themselves, the fear in the legal community is ungrounded. LLMs learn from billions of data points, so that adding a particular fragment of client information is like adding a single drop of water to an ocean of existing information. The likelihood that a client’s information is exposed through other means (gossip, data breaches, lost laptop, …) is likely much higher. Moreover, as explained below, LLMs do not literally store text, so the analogy is probably more “a vague drop of water” into an existing ocean.

However, customers can optionally use their own OpenAI subscription, if they wish — see below.

Customer's own subscription

Customers are free to use their own key (subscription) for GPT 4 Turbo in Azure or OpenAI. In fact, ClauseBase actively encourages customers to do this, to get better capacity and direct invoicing from Microsoft/OpenAI.

Last updated