- OpenAI has announced Privacy Filter, an open-source model that helps detect and remove personally identifiable information (PII) before data is sent to the cloud, reducing data leakage risks in AI.
- The model has 1.5 billion parameters but only activates 50 million per processing step, optimizing performance and allowing it to run on laptops or web browsers.
- It uses a Sparse Mixture-of-Experts architecture and a 128,000-token context window, enabling the processing of long documents like legal contracts without losing context.
- The model applies a Viterbi decoder with BIOES tagging to ensure that data removal maintains accurate semantic structure.
- It supports the identification of 8 types of PII, including personal names, contact information, numerical identifiers, and secrets like API keys or passwords.
- It allows businesses to process data on-device, meeting GDPR and HIPAA compliance standards.
- Released under the Apache 2.0 license, it allows for commercial use, customization, and does not require the product’s source code to be opened.
- The community highly values the “small but powerful” model, which is suitable for real-world AI pipelines at a low cost.
📌 Privacy Filter marks a major step as OpenAI returns to open source with a 1.5-billion-parameter model, optimized to use only 50 million per run, supporting 128,000 tokens and 8 types of sensitive data. The tool helps businesses comply with GDPR and HIPAA while reducing data leakage risks at the start of the pipeline. However, OpenAI warns that this is only a support tool and does not guarantee absolute protection, especially in sensitive fields like healthcare or law.

