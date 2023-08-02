Fig. The prompt with table schemas as context to generate SQL query

Incorporating context into prompts for an LLM offers advantages such as reducing training costs and enhancing security in certain cases.

For example, when integrating a data platform or warehouse with a LLM to facilitate natural language queries, it is possible to control the LLM's output by including authorized context within prompts. This approach ensures that users can only access content they have permission to view, thereby improving the system's overall security.

On the other hand, if the LLM memorizes the entire database schema through fine-tuning, it could potentially allow users to access content they do not have permission for, as they may be able to elicit unauthorized information from the LLM.

The limitations of prompt engineering

For general LLM models, prompt length is limited. For example, OpenAI has a token limit of 4096 for the current Text-Davinci-003 model and the GPT-3.5-turbo model. If a smaller LLM is used, the token limit may be as short as 512.

When injecting context into the prompt, it’s necessary to consider the rules for extracting context to avoid exceeding the limit. Currently, OpenAI claims that the token limit for future GPT-4 can be increased to 32k, which will significantly increase the upper limit of the prompt context.

Conclusion

Due to the significant improvement in the depth and breadth of LLM capabilities, the prompt engineering workflow — which is centered on prompt-based development — may require less data and achieve higher accuracy when solving natural language processing (NLP) and computer vision (CV) problems than a typical machine learning model-based workflow. This would reduce the resources required to develop intelligent products and could accelerate the process of iteration.

By organizing appropriate context in the prompt, we can flexibly use the capabilities of LLM and create more secure intelligent products.

In conclusion, the collection of production data and feedback is still critical in the prompt engineering workflow. It is an essential way to evaluate the effectiveness of the prompt; in fact, it is the only way. When context cannot meet product requirements, the machine learning model-based workflow can still be used, leveraging new datasets to fine-tune the LLM to enable it to possess specific knowledge.