Syntax Error-Free and Generalizable Tool Use for LLMs: ToolDec Enables Generalizable Tool Selection

📆 6/2/2024 7:44 AM

United States News News

United States Latest News,United States Headlines

📆 6/2/2024 7:44 AM
📰 hackernoon

⏱ Reading Time:
154 sec. here
4 min. at publisher
📊 Quality Score:
News: 65%
Publisher: 51%

Researchers propose TOOLDEC, a finite-state machine-guided decoding for LLMs, reducing errors and improving tool use.

Authors: Kexun Zhang, UC Santa Barbara and Equal contribution; Hongqiao Chen, Northwood High School and Equal contribution; Lei Li, Carnegie Mellon University; William Yang Wang,UC Santa Barbara. Table of Links Abstract and Intro Related Work ToolDec: LLM Tool Use via Finite-State Decoding Experiment: ToolDec Eliminates Syntax Errors Experiment: ToolDec Enables Generalizable Tool Selection Conclusion and References Appendix 5.

is able to efficiently generalize to new tools without fine-tuning on extra data. 5.1 FINE-TUNING BASELINE: TOOLKENGPT ToolkenGPT is a fine-tuning approach to tool use that learns a special token for every tool. To generalize to new tools, ToolkenGPT still needs additional data and extra fine-tuning involving the use of new tools. We demonstrate that

’s generalizability on a larger set of tools, we also evaluate on KAMEL , a question-answering dataset containing a total of 234 knowledge relations that resemble the characteristics of APIs . More examples can be found in Appendix A.4. The tools in KAMEL are many more than those in FuncQA. They are also more complex and diverse because the number of arguments to their tools varies from 1 to 3, and their types include strings, locations, dates, numbers and other ad-hoc types.

can achieve better accuracy without in-context documentation than the RestGPT baseline with documentation. Since Tful APIs. Since HTTP methods such as GET and POST have a format different from the tool call, tool arguments format of was able to maintain a comparable accuracy even on unseen tools and achieve 8x better accuracy on multi-hop problems, underscoring its generalizability. Consequently,

still significantly outperformed the baseline in terms of correctness indicated by correct path ratio , raising it by 8 points. These results suggest that In Experiment II, we show how , once fine-tuned on a given set of seen tools, doesn’t need additional data and further fine-tuning to adopt unseen tools. We compare

only to ToolkenGPT trained on the synthetic dataset proposed by the original study. We use the accuracy of tool calls as a metric, which is determined by the proportion of responses that invoke the correct knowledge relation. Benchmark on Knowledge Graph Relations. 5.2 IN-CONTEXT LEARNING BASELINE:

needs access to the next token distribution, we use Vicuna-based RestGPT as the baseline. For our method, we remove all tool documentation from the prompt, leaving only the instructions for reasoning. Benchmark on APIs for Real-World Web Services. We evaluate on RestBench . It consists of tasks in real-world scenarios including TMDB, a website for movie information, and Spotify, an online music player.

. We rewrote these APIs to follow this format. We use the correct path rate proposed by the original paper as the metric to measure accuracy. Correct path rate is the proportion of model outputs that contain the correct tool call path annotated by humans. Benchmark on APIs for Real-World Web Services. 5.3 EXPERIMENT

significantly outperformed ToolkenGPT on total accuracy. Generalization to Unseen Math Functions. Generalization to Unseen Knowledge Graph Functions. We present our results on KAMEL in Figure 5b. As the number of available tools increased, the two ICL methods suffered from the context length limit and experienced a significant drop in accuracy. ToolkenGPT, fine-tuned on the first 30 tools, was also unable to generalize to more tools.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Syntax Error-Free and Generalizable Tool Use for LLMs: ToolDec Eliminates Syntax ErrorsResearchers propose TOOLDEC, a finite-state machine-guided decoding for LLMs, reducing errors and improving tool use.
Read more »

Syntax Error-Free and Generalizable Tool Use for LLMs: Related WorkResearchers propose TOOLDEC, a finite-state machine-guided decoding for LLMs, reducing errors and improving tool use.
Read more »

Syntax Error-Free and Generalizable Tool Use for LLMs: Abstract and IntroResearchers propose TOOLDEC, a finite-state machine-guided decoding for LLMs, reducing errors and improving tool use.
Read more »

ToolTalk: Benchmarking Tool-Augmented LLMs in Conversational AIExplore ToolTalk, a benchmark for evaluating tool-augmented LLMs in conversational AI settings.
Read more »

Advancing Conversational AI with Complex Tool OrchestrationExplore ToolTalk, a benchmark for evaluating tool-augmented LLMs in conversational AI settings.
Read more »

Can LLMs have a "dream-like" state to uniquely facilitate creativity?Explore the intriguing parallels between the hypnagogic state and AI creativity.
Read more »