DataJarvis

LLM-powered autonomous agent system for data workflows

What is DataJarvis

The DataJarvis is a LLM-powered (large language model) autonomous agent system designed by the BingViz-Data team. DataJarvis utilizes the capabilities of LLM agents to streamline and automate complex workflows, enhancing efficiency and productivity for data management.

Basic Functions: Chart Generation & Data Explanation

Line Chart Generation

Line Chart Generation

Pie Chart Generation

Pie Chart Generation

Data Explanation

Data Explanation

Advanced Functions: Tool Usage & Visualization

Daily Report Generation

Daily Report Generation

Tool User Interface

Tool User Interface

Tool Creation & Usage: Maker & Visual Integration

Tool Creation Process

Tool Creation Process

Visual Tool Integration

Visual Tool Integration

Agent → Agent Flow → DataJarvis

Agent is the fundamental unit of DataJarvis. The LLM (e.g., Copilot / GPT-4) functions as the agent’s brain, integrated with memory, tools and a planning framework. An agent can be used to solve an atomic task in a specific field (e.g., PGSQL code generation, draw bar chart).

For a complicated task, multiple agents constitute an agent flow, where data moves from upstream to downstream. A complicated task is decomposed into many atomic tasks and assigned to various agents in the flow. The agents share tools and memory. The agent flow is a pipeline framework that solves a complex problem step by step — a powerful general problem solver.

Agent Overview

Agent = LLM + Planning + Tools + Memory

Agent overview diagram

Planning

Memory

How Graph RAG Constructs Vectors

  1. Data Retrieval: retrieve relevant data from a knowledge base.
  2. Data Integration: integrate retrieved data with the input context.
  3. Vector Generation: generate high-quality vectors representing entities and relations.

Tool use & Tool build

Tool use diagram
Function call and Tool use (OpenAPI Integration)
  1. Trigger tools when the response requires it.
  2. Implement custom tools for tasks (e.g., DB queries, running tests).
  3. Integrate tool responses back into conversation context.

Future: allow users to build tools in real time; Tool Maker Agent saves tools for reuse (TODO)

Agent Flow Overview

Agent flow diagram

In the current DataJarvis, specific agents within an agent flow rely on magic commands for activation (e.g., $run, $explain). In the future, DataJarvis will be able to automatically generate agent flows with planning (task decomposition) ability.

Technical Architecture of DataJarvis

Frontend: Streamlit

Advantages

Disadvantages

Backend: Python on Azure Web App

Advantages

Disadvantages

DataJarvis Demo Video

Watch DataJarvis in action! This demonstration showcases the core capabilities of our LLM-powered autonomous agent system for data workflows.

Experience the power of DataJarvis: from natural language queries to automated data analysis and visualization

TO-DO & Done

Tutorial & Example

PGSQL code generation

What are the dau of Bing-Android, Start-Android in recent 7 days?
PGSQL code generation example

Run PGSQL

$run
Run PGSQL screenshot

Explain the data

$explain$ please explain chart in 100 words
Explain chart screenshot

Draw EChart for data visualization

$visual$ please draw a pie chart.
ECharts pie chart

Use Tool

$usetool$ Please generate daily check report on July 10th, 2024.
Tool use 1 Tool use 2

Plan

Related Work

Reference

Issue

If you have any issues, please contact kexin.chen@microsoft.com or ckqqqq@bupt.edu.cn.