首页 - 技术栈

网站服务器维护内容新开传奇网站手游

作者: 五速梦信息网
时间: 2026年04月20日 08:10

当前位置：首页 > news >正文

网站服务器维护内容,新开传奇网站手游,江都网站制作,四川住房城乡建设厅官网作者#xff1a;来自 Elastic Gustavo Llermaly 及 Jeffrey Rengifo 本文分为两部分#xff0c;第二部分介绍如何使用自定义连接器将 OneLake 数据索引并搜索到 Elastic 中。在本文中#xff0c;我们将利用第 1 部分中学到的知识来创建 OneLake 自定义 Elasticsearch 连接器…作者来自 Elastic Gustavo Llermaly 及 Jeffrey Rengifo 本文分为两部分第二部分介绍如何使用自定义连接器将 OneLake 数据索引并搜索到 Elastic 中。在本文中我们将利用第 1 部分中学到的知识来创建 OneLake 自定义 Elasticsearch 连接器。我们已经上传了一些 OneLake 文档并将其索引到 Elasticsearch 中以供搜索。但是这仅适用于一次性上传。如果我们想要同步数据那么我们需要开发一个更复杂的系统。幸运的是Elastic 有一个连接器框架可用于开发满足我们需求的自定义连接器我们现在将根据本文制作一个 OneLake 连接器如何为 Elasticsearch 创建自定义连接器。步骤连接器引导实现 BaseDataSource 类身份验证运行连接器配置计划连接器引导背景信息Elastic 连接器分为两种类型 Elastic 托管连接器完全由 Elastic Cloud 托管和运行。自托管连接器由用户自行托管必须部署在你的基础设施中。自定义连接器属于 “连接器客户端” 类别因此我们需要下载并部署连接器框架。首先克隆连接器的代码库 git clone https://github.com/elastic/connectors 现在在 requirements/framework.txt 文件末尾添加你将使用的依赖项。在本例中 azure-identity1.19.0 azure-storage-file-datalake12.17.0 这样存储库就完成了我们可以开始编码了。实现 BaseDataSource 类你可以在此存储库中找到完整的工作代码。我们将介绍 onelake.py 文件中的核心部分。在导入和类声明之后我们必须定义将捕获配置参数的 init 方法。 OneLake connector to retrieve data from datalakesfrom functools import partialfrom azure.identity import ClientSecretCredential from azure.storage.filedatalake import DataLakeServiceClientfrom connectors.source import BaseDataSourceACCOUNT_NAME onelakeclass OneLakeDataSource(BaseDataSource):OneLakename OneLakeservice_type onelakeincremental_sync_enabled True# Here we can enter the data that well later need to connect our connector to OneLake.def init(self, configuration):Set up the connection to the azure base clientArgs:configuration (DataSourceConfiguration): Object of DataSourceConfiguration class.super().init(configurationconfiguration)self.tenant_id self.configuration[tenant_id]self.client_id self.configuration[client_id]self.client_secret self.configuration[client_secret]self.workspace_name self.configuration[workspace_name]self.data_path self.configuration[data_path] 然后你可以配置 UI 将显示的表单使用返回配置字典的 get_default_configuration 方法填充这些参数。 # Method to generate the Enterprise Search UI fields for the variables we need to connect to OneLake.classmethoddef get_default_configuration(cls):Get the default configuration for OneLakeReturns:dictionary: Default configurationreturn {tenant_id: {label: OneLake tenant id,order: 1,type: str,},client_id: {label: OneLake client id,order: 2,type: str,},client_secret: {label: OneLake client secret,order: 3,type: str,sensitive: True, # To hide sensitive data like passwords or secrets},workspace_name: {label: OneLake workspace name,order: 4,type: str,},data_path: {label: OneLake data path,tooltip: Path in format DataLake.Lakehouse/files/Folder path,order: 5,type: str,},account_name: {tooltip: In the most cases is onelake,default_value: ACCOUNT_NAME,label: Account name,order: 6,type: str,},} 然后我们配置下载方法并从 OneLake 文档中提取内容。 async def download_file(self, file_client):Download file from OneLakeArgs:file_client (obj): File clientReturns:generator: File streamtry:download file_client.download_file()stream download.chunks()for chunk in stream:yield chunkexcept Exception as e:self._logger.error(fError while downloading file: {e})raiseasync def get_content(self, file_name, doitNone, timestampNone):Obtains the file content for the specified file in file_name.Args:file_name (obj): The file name to process to obtain the content.timestamp (timestamp, optional): Timestamp of blob last modified. Defaults to None.doit (boolean, optional): Boolean value for whether to get content or not. Defaults to None.Returns:str: Content of the file or None if not applicable.if not doit:returnfile_client await self._get_file_client(file_name)file_properties file_client.get_file_properties()file_extension self.get_file_extension(file_name)doc {_id: f{file_client.file_systemname}{file_properties.name}, # workspacename_data_pathname: file_properties.name.split(/)[-1],_timestamp: file_properties.last_modified,created_at: file_properties.creation_time,}can_be_downloaded self.can_file_be_downloaded(file_extensionfile_extension,filenamefile_properties.name,file_sizefile_properties.size,)if not can_be_downloaded:return docextracted_doc await self.download_and_extract_file(docdoc,source_filenamefile_properties.name.split(/)[-1],file_extensionfile_extension,download_funcpartial(self.download_file, file_client),)return extracted_doc if extracted_doc is not None else doc 为了让我们的连接器对框架可见我们需要在 connectors/config.py 文件中声明它。为此我们将以下代码添加到源中 sources: {…onelake: connectors.sources.onelake:OneLakeDataSource,…} 身份验证在测试连接器之前我们需要获取 client_id, tenant_id 和 client_secret我们将使用它们从连接器访问工作区。我们将使用 service principals 作为身份验证方法。 Azure service principal 是为与应用程序、托管服务和自动化工具一起使用以访问 Azure 资源而创建的身份。步骤如下创建应用程序并收集 client_id、tenant_id 和 client_secret在工作区中启用 service principal将 service principal 添加到工作区你可以逐步遵循本教程。准备好了吗现在是测试连接器的时候了运行连接器连接器准备好后我们现在可以连接到我们的 Elasticsearch 实例。转到 Search Content Connectors New connector 并选择 Customized Connector 选择要创建的名称然后选择 “Create and attach an index” 以创建与连接器同名的新索引。你现在可以使用 Docker 运行它或从源代码运行它。在此示例中我们将使用 “Run from source”。单击 “Generate Configuration”然后将框中的内容粘贴到项目根目录中的 config.yml 文件中。在字段 service_type 上你必须匹配 Connectors/config.py 中的连接器名称。在本例中将 changeme 替换为 onelake。现在你可以使用以下命令运行连接器 make install make run 如果连接器正确初始化你应该在控制台中看到如下消息注意如果出现兼容性错误请检查你的连接器/版本文件并与你的 Elasticsearch 集群版本进行比较与 Elasticsearch 的版本兼容性。我们建议保持连接器版本和 Elasticsearch 版本同步。在本文中我们使用 Elasticsearch 和连接器版本 8.15。如果一切顺利我们的本地连接器将与我们的 Elasticsearch 集群通信我们将能够使用我们的 OneLake 凭据对其进行配置我们现在将索引来自 OneLake 的文档。为此请单击 Sync Full Content运行完整内容同步同步完成后你应该在控制台中看到以下内容在企业搜索 UI 中你可以单击 “Documents” 来查看已索引的文档配置计划你可以根据需要使用 UI 安排定期内容同步以使索引保持更新并与 OneLake 同步。要配置计划同步请转到 “Search Content Connectors然后选择你的连接器。然后单击 “scheduling” 或者你可以使用允许 CRON 表达式的更新连接器调度 API。结论在第二部分中我们通过使用 Elastic 连接器框架并开发我们自己的 OneLake 连接器来轻松与我们的 Elastic Cloud 实例通信将我们的配置更进一步。想要获得 Elastic 认证了解下一次 Elasticsearch 工程师培训何时开始 Elasticsearch 包含新功能可帮助你为你的用例构建最佳搜索解决方案。深入了解我们的示例笔记本以了解更多信息开始免费云试用或立即在你的本地机器上试用 Elastic。原文Indexing OneLake data into Elasticsearch - Part II - Elasticsearch Labs