向量数据库

用于smart-vector向量数据库 以下为参考代码采用starrocks做为向量计算, 企业级向量化方案需专业版本:

create database smartdb;
CREATE TABLE IF NOT EXISTS vectors(
    collection String not null,
    document String,
    id bigint AUTO_INCREMENT,
    sr String,
    embedding Array<Float>,
    answer string,
    c String,
    m String,
    owner varchar(50),
    update_time DATETIME DEFAULT CURRENT_TIMESTAMP
 )
DUPLICATE KEY (collection) comment 'smart embeddings' DISTRIBUTED BY HASH(collection)

新建自定义连接器,使用方法参考"自定义数据源"

from smart_chart.common.smartvector import SmartVectorDB, Text2VecEmbeddingFunction, Text2VecDashscopeFunction
text_vector = Text2VecEmbeddingFunction()  # 使用本地模型文本转向量
# text_vector = Text2VecDashscopeFunction()  # 使用阿里Dashscope文本转向量

def dataset(*args, **kwargs):
    promote = args[0][0]
    db_config = args[1]
    table = db_config.get('table', 'vectors')
    db_config['metric'] = 'distance'
    return SmartVectorDB(db_config=db_config, text_vector=text_vector, table=table).get(promote)


def insert_dataset(*args, **kwargs):
    contents = args[0]
    table = args[1]
    connect_dict = args[3]
    docIndex = contents[0].index('document')
    contents[0].append('embedding')
    for item in contents[1:]:
        item.append(str(text_vector(item[docIndex])[0]))
    SmartVectorDB(db_config=connect_dict, text_vector=text_vector)._execute_load(contents, table)
    return len(contents) - 1

在数据集中的查询方法

Smartchart的作者是谁
collection,answer
collection='测试' and owner='John'
2

写入方法,同ds_save的方法,如:

dataset=['分类名', '问题', '答案', '$username']
print(ds_save(1, [[],['collection','document','answer','owner'],dataset]));