用于smart-vector向量数据库 以下为参考代码采用starrocks做为向量计算, 企业级向量化方案需专业版本:
create database smartdb;
CREATE TABLE IF NOT EXISTS vectors(
collection String not null,
document String,
id bigint AUTO_INCREMENT,
sr String,
embedding Array<Float>,
answer string,
c String,
m String,
owner varchar(50),
update_time DATETIME DEFAULT CURRENT_TIMESTAMP
)
DUPLICATE KEY (collection) comment 'smart embeddings' DISTRIBUTED BY HASH(collection)
新建自定义连接器,使用方法参考"自定义数据源"
from smart_chart.common.smartvector import SmartVectorDB, Text2VecEmbeddingFunction, Text2VecDashscopeFunction
text_vector = Text2VecEmbeddingFunction() # 使用本地模型文本转向量
# text_vector = Text2VecDashscopeFunction() # 使用阿里Dashscope文本转向量
def dataset(*args, **kwargs):
promote = args[0][0]
db_config = args[1]
table = db_config.get('table', 'vectors')
db_config['metric'] = 'distance'
return SmartVectorDB(db_config=db_config, text_vector=text_vector, table=table).get(promote)
def insert_dataset(*args, **kwargs):
contents = args[0]
table = args[1]
connect_dict = args[3]
docIndex = contents[0].index('document')
contents[0].append('embedding')
for item in contents[1:]:
item.append(str(text_vector(item[docIndex])[0]))
SmartVectorDB(db_config=connect_dict, text_vector=text_vector)._execute_load(contents, table)
return len(contents) - 1
在数据集中的查询方法
Smartchart的作者是谁
collection,answer
collection='测试' and owner='John'
2
写入方法,同ds_save的方法,如:
dataset=['分类名', '问题', '答案', '$username']
print(ds_save(1, [[],['collection','document','answer','owner'],dataset]));