Duckdb数据源

简介 #

DuckDB 是一款进程内(in-process)分析型数据库,无需独立服务器即可处理大规模数据集。SmartChart 支持 DuckDB 作为数据源,可以直接用类 PostgreSQL 语法查询 CSV、Excel、Parquet 等文件,并支持跨数据库联合查询(如 MySQL 直连)。

特性 说明
免部署 单文件进程内运行,无独立服务依赖
多源关联 可同时连接 MySQL、PostgreSQL 等外部数据库
语法兼容 类 PostgreSQL,支持 PIVOT 交叉表等分析语法
安装方式 pip install duckdb 即可
应用场景 #

Duckdb是一进程内分析数据库,它可以在无需维护分布式多服务器系统的情况下处理出人意料的大型数据集 你还可以使用 duckdb做多源关联查询

使用方法 #
  • 首先您需要安装Duckdb的包
pip install duckdb
  • 新建连接

在连接地址填写完整的路径名即可, 如D:/abc/my.duckdb

  • duckdb采用类postgrep的SQL语法, 也有一些特殊的语法, 比如:
csv文件导入数据并建表
CREATE TABLE 'cars' AS SELECT * FROM read_csv_auto('cars.csv');
CREATE TEMP TABLE t1 AS SELECT * FROM read_csv('path/file.csv');
COPY lineitem FROM 'lineitem.csv';
Excel读取与写入
SELECT * FROM read_xlsx('test.xlsx', header = true);
CREATE TABLE test (a DOUBLE, b DOUBLE);
COPY test FROM 'test.xlsx' WITH (FORMAT xlsx, HEADER);
SELECT * FROM test;
CREATE TABLE test AS
    SELECT *
    FROM (VALUES (1, 2), (3, 4)) AS t(a, b);
COPY test TO 'test.xlsx' WITH (FORMAT xlsx, HEADER true);
Mysql读取与写入
INSTALL mysql;
LOAD mysql;
ATTACH 'host=localhost user=root port=0 database=mysql' AS mysqldb (TYPE mysql);
USE mysqldb;

CREATE SECRET (
    TYPE mysql,
    HOST '127.0.0.1',
    PORT 0,
    DATABASE mysql,
    USER 'mysql',
    PASSWORD ''
);

ATTACH '' AS mysql_db (TYPE mysql);
ATTACH 'database=my_other_db' AS mysql_db (TYPE mysql);
       
CREATE SECRET mysql_secret (
    TYPE mysql,
    HOST '127.0.0.1',
    PORT 0,
    DATABASE mysql,
    USER 'mysql',
    PASSWORD ''
);
ATTACH '' AS mysql_db (TYPE mysql, SECRET mysql_secret);
       
CREATE TABLE duckdb_table AS FROM mysql_db.mysql_table;
交叉表查询
CREATE TABLE Cities (
    Country VARCHAR, Name VARCHAR, Year INTEGER, Population INTEGER
);
INSERT INTO Cities VALUES
    ('NL', 'Amsterdam', 2000, 1005),
    ('NL', 'Amsterdam', 2010, 1065),
    ('NL', 'Amsterdam', 2020, 1158),
    ('US', 'Seattle', 2000, 564),
    ('US', 'Seattle', 2010, 608),
    ('US', 'Seattle', 2020, 738),
    ('US', 'New York City', 2000, 8015),
    ('US', 'New York City', 2010, 8175),
    ('US', 'New York City', 2020, 8772);

PIVOT Cities
ON Year
USING sum(Population);

PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country;

PIVOT Cities
ON Year IN (2000, 2010)
USING sum(Population)
GROUP BY Country;

PIVOT Cities
ON Country, Name
USING sum(Population);

PIVOT Cities
ON Year
USING sum(Population) AS total, max(Population) AS max
GROUP BY Country;

PIVOT Cities
ON Year
USING sum(Population)
GROUP BY Country, Name;