Databricks, a company specializing in assisting large enterprises with creating custom artificial intelligence models, has introduced a machine-learning technique that enhances an AI model’s performance without requiring clean labeled data. Jonathan Frankle, the chief AI scientist at Databricks, has spent the past year engaging with clients to identify the significant challenges they encounter in deploying reliable AI solutions.
Frankle identifies “dirty data” as a primary issue, noting that, although many organizations possess data and have a clear vision of their objectives, the absence of clean data complicates the task of accurately fine-tuning a model. It is rare for companies to have pristine, fine-tuning data that can be directly fed into a model’s prompt or application programming interface.
The model developed by Databricks aims to enable businesses to deploy their own agents to perform tasks, minimizing the obstacle posed by data quality. This technique provides insight into some key strategies engineers are adopting to enhance advanced AI models, especially in scenarios where high-quality data is scarce. It combines reinforcement learning, which allows AI models to improve through practice, with “synthetic,” or AI-generated, training data.
Current models from OpenAI, Google, and DeepSeek extensively utilize reinforcement learning along with synthetic training data. WIRED has reported that Nvidia intends to acquire Gretel, a company specializing in synthetic data, highlighting the industry’s collective navigation of this space.
The Databricks method utilizes the principle that, given multiple attempts, even a less robust model can perform well on specific tasks or benchmarks. This technique, known as “best-of-N,” involves training a model to predict which best-of-N outcomes humans would favor, based on examples. The Databricks reward model (DBRM) then improves the performance of other models without further labeled data.
DBRM is subsequently employed to choose the best outputs from a model, generating synthetic training data for further refinement. This process is designed to enhance initial model outputs. Databricks refers to this innovative approach as Test-time Adaptive Optimization (TAO). Frankle notes that this method utilizes relatively lightweight reinforcement learning to integrate the benefits of best-of-N directly into the model.
The research by Databricks indicates that the TAO method becomes more effective as it is scaled up to larger, more capable models. Although reinforcement learning and synthetic data are widely used, integrating them to improve language models is a relatively new and technically complex approach.
Databricks maintains a transparent stance regarding its AI development processes, intending to demonstrate its capability to produce powerful custom models for clients. The company previously disclosed to WIRED the development of DBX, a sophisticated open-source large language model, created from scratch.