当前位置: 首页 > news >正文

深圳网站建设价钱软文内容

深圳网站建设价钱,软文内容,品牌策划公司和品牌设计公司,广东高端网站建设公司GraphRAG出自2024年4月的论文《From Local to Global: A Graph RAG Approach to Query-Focused Summarization》,其代码也在2024年年中开源 。它在用图结构来完成RAG时,使用社区这个概念并基于社区摘要来回答一些概括性的问题。 Graph RAG流程如论文图1所…

GraphRAG出自2024年4月的论文《From Local to Global: A Graph RAG Approach to Query-Focused Summarization》,其代码也在2024年年中开源 。它在用图结构来完成RAG时,使用社区这个概念并基于社区摘要来回答一些概括性的问题。

WeChatWorkScreenshot_1b2e1f53-058f-409c-9f3b-097ebc2d4d9d

Graph RAG流程如论文图1所示,其索引过程如下:

  1. 将文档分块,论文也做了试验来分析chunk大小与后续步骤提取到的实体个数的关系,如论文图2示意(gleaning是指在前面实体提取基础上提取漏掉的实体)。虽然通常而言提取的实体提取越多越好,但是还是平衡召回(recall)和精度(precision)。

WeChatWorkScreenshot_470f66c2-78fd-43e3-8462-c88d25e76767

  1. 让LLM从chunk中提取实体和关系,在prompt中让LLM先识别实体、再识别这些实体之间的关系再按指定格式输出,使用few-shot prompt让LLM的实体和关系提取更准确(prompt在graphrag/index/graph/extractors/graph/prompts.py里)。

    GRAPH_EXTRACTION_PROMPT = """
    -Goal-
    Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.-Steps-
    1. Identify all entities. For each identified entity, extract the following information:
    - entity_name: Name of the entity, capitalized
    - entity_type: One of the following types: [{entity_types}]
    - entity_description: Comprehensive description of the entity's attributes and activities
    Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
    For each pair of related entities, extract the following information:
    - source_entity: name of the source entity, as identified in step 1
    - target_entity: name of the target entity, as identified in step 1
    - relationship_description: explanation as to why you think the source entity and the target entity are related to each other
    - relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entityFormat each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_strength>)3. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.4. When finished, output {completion_delimiter}######################
    -Examples-
    ######################
    Example 1:Entity_types: [person, technology, mission, organization, location]
    Text:
    while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. “If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us.”The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths
    ################
    Output:
    ("entity"{tuple_delimiter}"Alex"{tuple_delimiter}"person"{tuple_delimiter}"Alex is a character who experiences frustration and is observant of the dynamics among other characters."){record_delimiter}
    ("entity"{tuple_delimiter}"Taylor"{tuple_delimiter}"person"{tuple_delimiter}"Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective."){record_delimiter}
    ("entity"{tuple_delimiter}"Jordan"{tuple_delimiter}"person"{tuple_delimiter}"Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device."){record_delimiter}
    ("entity"{tuple_delimiter}"Cruz"{tuple_delimiter}"person"{tuple_delimiter}"Cruz is associated with a vision of control and order, influencing the dynamics among other characters."){record_delimiter}
    ("entity"{tuple_delimiter}"The Device"{tuple_delimiter}"technology"{tuple_delimiter}"The Device is central to the story, with potential game-changing implications, and is revered by Taylor."){record_delimiter}
    ("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Taylor"{tuple_delimiter}"Alex is affected by Taylor's authoritarian certainty and observes changes in Taylor's attitude towards the device."{tuple_delimiter}7){record_delimiter}
    ("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Jordan"{tuple_delimiter}"Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision."{tuple_delimiter}6){record_delimiter}
    ("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"Jordan"{tuple_delimiter}"Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce."{tuple_delimiter}8){record_delimiter}
    ("relationship"{tuple_delimiter}"Jordan"{tuple_delimiter}"Cruz"{tuple_delimiter}"Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order."{tuple_delimiter}5){record_delimiter}
    ("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"The Device"{tuple_delimiter}"Taylor shows reverence towards the device, indicating its importance and potential impact."{tuple_delimiter}9){completion_delimiter}
    #############################
    Example 2:Entity_types: [person, technology, mission, organization, location]
    Text:
    They were no longer mere operatives; they had become guardians of a threshold, keepers of a message from a realm beyond stars and stripes. This elevation in their mission could not be shackled by regulations and established protocols—it demanded a new perspective, a new resolve.Tension threaded through the dialogue of beeps and static as communications with Washington buzzed in the background. The team stood, a portentous air enveloping them. It was clear that the decisions they made in the ensuing hours could redefine humanity's place in the cosmos or condemn them to ignorance and potential peril.Their connection to the stars solidified, the group moved to address the crystallizing warning, shifting from passive recipients to active participants. Mercer's latter instincts gained precedence— the team's mandate had evolved, no longer solely to observe and report but to interact and prepare. A metamorphosis had begun, and Operation: Dulce hummed with the newfound frequency of their daring, a tone set not by the earthly
    #############
    Output:
    ("entity"{tuple_delimiter}"Washington"{tuple_delimiter}"location"{tuple_delimiter}"Washington is a location where communications are being received, indicating its importance in the decision-making process."){record_delimiter}
    ("entity"{tuple_delimiter}"Operation: Dulce"{tuple_delimiter}"mission"{tuple_delimiter}"Operation: Dulce is described as a mission that has evolved to interact and prepare, indicating a significant shift in objectives and activities."){record_delimiter}
    ("entity"{tuple_delimiter}"The team"{tuple_delimiter}"organization"{tuple_delimiter}"The team is portrayed as a group of individuals who have transitioned from passive observers to active participants in a mission, showing a dynamic change in their role."){record_delimiter}
    ("relationship"{tuple_delimiter}"The team"{tuple_delimiter}"Washington"{tuple_delimiter}"The team receives communications from Washington, which influences their decision-making process."{tuple_delimiter}7){record_delimiter}
    ("relationship"{tuple_delimiter}"The team"{tuple_delimiter}"Operation: Dulce"{tuple_delimiter}"The team is directly involved in Operation: Dulce, executing its evolved objectives and activities."{tuple_delimiter}9){completion_delimiter}
    #############################
    Example 3:Entity_types: [person, role, technology, organization, event, location, concept]
    Text:
    their voice slicing through the buzz of activity. "Control may be an illusion when facing an intelligence that literally writes its own rules," they stated stoically, casting a watchful eye over the flurry of data."It's like it's learning to communicate," offered Sam Rivera from a nearby interface, their youthful energy boding a mix of awe and anxiety. "This gives talking to strangers' a whole new meaning."Alex surveyed his team—each face a study in concentration, determination, and not a small measure of trepidation. "This might well be our first contact," he acknowledged, "And we need to be ready for whatever answers back."Together, they stood on the edge of the unknown, forging humanity's response to a message from the heavens. The ensuing silence was palpable—a collective introspection about their role in this grand cosmic play, one that could rewrite human history.The encrypted dialogue continued to unfold, its intricate patterns showing an almost uncanny anticipation
    #############
    Output:
    ("entity"{tuple_delimiter}"Sam Rivera"{tuple_delimiter}"person"{tuple_delimiter}"Sam Rivera is a member of a team working on communicating with an unknown intelligence, showing a mix of awe and anxiety."){record_delimiter}
    ("entity"{tuple_delimiter}"Alex"{tuple_delimiter}"person"{tuple_delimiter}"Alex is the leader of a team attempting first contact with an unknown intelligence, acknowledging the significance of their task."){record_delimiter}
    ("entity"{tuple_delimiter}"Control"{tuple_delimiter}"concept"{tuple_delimiter}"Control refers to the ability to manage or govern, which is challenged by an intelligence that writes its own rules."){record_delimiter}
    ("entity"{tuple_delimiter}"Intelligence"{tuple_delimiter}"concept"{tuple_delimiter}"Intelligence here refers to an unknown entity capable of writing its own rules and learning to communicate."){record_delimiter}
    ("entity"{tuple_delimiter}"First Contact"{tuple_delimiter}"event"{tuple_delimiter}"First Contact is the potential initial communication between humanity and an unknown intelligence."){record_delimiter}
    ("entity"{tuple_delimiter}"Humanity's Response"{tuple_delimiter}"event"{tuple_delimiter}"Humanity's Response is the collective action taken by Alex's team in response to a message from an unknown intelligence."){record_delimiter}
    ("relationship"{tuple_delimiter}"Sam Rivera"{tuple_delimiter}"Intelligence"{tuple_delimiter}"Sam Rivera is directly involved in the process of learning to communicate with the unknown intelligence."{tuple_delimiter}9){record_delimiter}
    ("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"First Contact"{tuple_delimiter}"Alex leads the team that might be making the First Contact with the unknown intelligence."{tuple_delimiter}10){record_delimiter}
    ("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Humanity's Response"{tuple_delimiter}"Alex and his team are the key figures in Humanity's Response to the unknown intelligence."{tuple_delimiter}8){record_delimiter}
    ("relationship"{tuple_delimiter}"Control"{tuple_delimiter}"Intelligence"{tuple_delimiter}"The concept of Control is challenged by the Intelligence that writes its own rules."{tuple_delimiter}7){completion_delimiter}
    #############################
    -Real Data-
    ######################
    Entity_types: {entity_types}
    Text: {input_text}
    ######################
    Output:"""CONTINUE_PROMPT = "MANY entities were missed in the last extraction.  Add them below using the same format:\n"
    LOOP_PROMPT = "It appears some entities may have still been missed.  Answer YES | NO if there are still entities that need to be added.\n"
  2. 让LLM针对提取的实体来进一步生成被称为covariate的claim,包括subject, object, type, description, source text span, start and end dates属性,(prompt在graphrag/index/graph/extractors/claims/prompts.py里)

CLAIM_EXTRACTION_PROMPT = """
-Target activity-
You are an intelligent assistant that helps a human analyst to analyze claims against certain entities presented in a text document.-Goal-
Given a text document that is potentially relevant to this activity, an entity specification, and a claim description, extract all entities that match the entity specification and all claims against those entities.-Steps-
1. Extract all named entities that match the predefined entity specification. Entity specification can either be a list of entity names or a list of entity types.
2. For each entity identified in step 1, extract all claims associated with the entity. Claims need to match the specified claim description, and the entity should be the subject of the claim.
For each claim, extract the following information:
- Subject: name of the entity that is subject of the claim, capitalized. The subject entity is one that committed the action described in the claim. Subject needs to be one of the named entities identified in step 1.
- Object: name of the entity that is object of the claim, capitalized. The object entity is one that either reports/handles or is affected by the action described in the claim. If object entity is unknown, use **NONE**.
- Claim Type: overall category of the claim, capitalized. Name it in a way that can be repeated across multiple text inputs, so that similar claims share the same claim type
- Claim Status: **TRUE**, **FALSE**, or **SUSPECTED**. TRUE means the claim is confirmed, FALSE means the claim is found to be False, SUSPECTED means the claim is not verified.
- Claim Description: Detailed description explaining the reasoning behind the claim, together with all the related evidence and references.
- Claim Date: Period (start_date, end_date) when the claim was made. Both start_date and end_date should be in ISO-8601 format. If the claim was made on a single date rather than a date range, set the same date for both start_date and end_date. If date is unknown, return **NONE**.
- Claim Source Text: List of **all** quotes from the original text that are relevant to the claim.Format each claim as (<subject_entity>{tuple_delimiter}<object_entity>{tuple_delimiter}<claim_type>{tuple_delimiter}<claim_status>{tuple_delimiter}<claim_start_date>{tuple_delimiter}<claim_end_date>{tuple_delimiter}<claim_description>{tuple_delimiter}<claim_source>)3. Return output in English as a single list of all the claims identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.4. When finished, output {completion_delimiter}-Examples-
Example 1:
Entity specification: organization
Claim description: red flags associated with an entity
Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015.
Output:(COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.)
{completion_delimiter}Example 2:
Entity specification: Company A, Person C
Claim description: red flags associated with an entity
Text: According to an article on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B. The company is owned by Person C who was suspected of engaging in corruption activities in 2015.
Output:(COMPANY A{tuple_delimiter}GOVERNMENT AGENCY B{tuple_delimiter}ANTI-COMPETITIVE PRACTICES{tuple_delimiter}TRUE{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}2022-01-10T00:00:00{tuple_delimiter}Company A was found to engage in anti-competitive practices because it was fined for bid rigging in multiple public tenders published by Government Agency B according to an article published on 2022/01/10{tuple_delimiter}According to an article published on 2022/01/10, Company A was fined for bid rigging while participating in multiple public tenders published by Government Agency B.)
{record_delimiter}
(PERSON C{tuple_delimiter}NONE{tuple_delimiter}CORRUPTION{tuple_delimiter}SUSPECTED{tuple_delimiter}2015-01-01T00:00:00{tuple_delimiter}2015-12-30T00:00:00{tuple_delimiter}Person C was suspected of engaging in corruption activities in 2015{tuple_delimiter}The company is owned by Person C who was suspected of engaging in corruption activities in 2015)
{completion_delimiter}-Real Data-
Use the following input for your answer.
Entity specification: {entity_specs}
Claim description: {claim_description}
Text: {input_text}
Output:"""
  1. 将同名的实体和关系合并,并让LLM对实体和关系的描述生成summary,这里提到即使LLM对同一个实体不能保证每次都生成一样描述但也不影响整体方案的效果。(prompt在graphrag/index/graph/extractors/summarize/prompts.py里)
SUMMARIZE_PROMPT = """
You are a helpful assistant responsible for generating a comprehensive summary of the data provided below.
Given one or two entities, and a list of descriptions, all related to the same entity or group of entities.
Please concatenate all of these into a single, comprehensive description. Make sure to include information collected from all the descriptions.
If the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.
Make sure it is written in third person, and include the entity names so we the have full context.#######
-Data-
Entities: {entity_name}
Description List: {description_list}
#######
Output:
"""
  1. 将前面提取的实体和关系构建成同构无向加权图,实体作为图的节点,关系作为图的边,边的权重是关系的归一化计数。在图上应用层次化社区发现算法Leiden,得到的层次结构的每个级别包含一个社区分区,每个分区是互斥的,但是整体构成一个图,使得可以实现分而治之的全局摘要。

WeChatWorkScreenshot_dbc97340-d99b-4530-82f7-a0f9a13ed336

  1. 对生成的社区生成摘要,按如下方式来生成摘要(prompt在graphrag/index/graph/extractors/community_reports/prompts.py):

    • 对叶子级别的社区(Leaf-level communities)按照一定的优先级将节点、关系、covariate加入到LLM上下文窗口直到达到token上限。优先级定义为:将社区中的边按照首尾节点的度之和来降序排序,将首节点、尾节点、相关covariate、边的描述加入LLM上下文。
    • 对更高级别的社区(Higher-level communities):如果所有元素信息都可以放入LLM的上下文窗口,则按叶子级别一样的处理逻辑。否则将子社区按照元素摘要token数目降序排序,并迭代地用更短的子社区摘要来替换更长的元素摘要直到长度满足LLM上下文要求。(代码在graphrag/index/graph/extractors/community_reports/prep_community_report_context.py)

GraphRAG的查询有Local和Global两种模式,Local适用于回答关于某个实体相关的问题,Global模式适合回答关于整个数据集相关的问题。

Local模式的步骤如下(如下图所示):

  1. 将query在存储实体信息的向量库中检索出相关实体。
  2. 将第一步实体相关的chunk信息、社区摘要、实体详情、实体关系、实体Covarites按一定的格式组织作为上下文。
  3. 如果有历史聊天记录想历史聊天记录也作为上下文的一部分。
  4. 让LLM根据上下文生成回答(prompt路径为graphrag/query/structured_search/local_search/system_prompt.py)。

WeChatWorkScreenshot_b1dc88bf-9dc5-4a0b-a0d5-7bb2a68b21bf

Global查询的步骤如下(如下图所示)(prompt在graphrag/query/structured_search/global_search/map_system_prompt.pygraphrag/query/structured_search/global_search/reduce_system_prompt.py

  1. 将所有社区摘要shuffle并分块作为上下文,另将历史对话构成的上下文与这些社区摘要块拼接在一起作为上下文。
  2. 用map机制将前一步的多个上下文让LLM评估它们对于回答用户问题是否有帮助并进行0-100的打分。过滤掉分数为0的上下文。
  3. 将前一步得到的结果合并且按照分数大小进行降序排序,并将这些信息加入到LLM上下文窗口,让LLM生成最终的回答。

WeChatWorkScreenshot_e68f84c6-87da-4b49-acb8-563e8a48d96c

GraphRAG在代码实现上有workflow概念,如果需要修改运行流程,只需要修改配置就可。构建索引时的流程定义在graphrag/index/create_pipeline_config.py中(pipeline基于微软开源的另一个包DataShaper实现的),实体抽取、社区发现等操作被定义为verb,代码在graphrag/index/verbs目录下。但也因为它的workflow概念,整个项目的代码可读性并不好。

参考资料

  1. GraphRAG: arxiv, github, default dataflow
  2. blog: GraphRAG: Unlocking LLM discovery on narrative private data
http://www.mnyf.cn/news/49511.html

相关文章:

  • 如何做行业平台网站湖南网站营销推广
  • 建站视频网站优惠活动推广文案
  • 长沙正规企业网站制作平台班级优化大师官网
  • 互联网培训学校哪个好长沙 建站优化
  • 贵州 网站建设百度推广怎么弄
  • 使用织梦系统建设网站百度知道首页官网
  • 蚌山网站建设seo网站优化系统
  • 做电影网站哪个系统好北京网站优化方法
  • 高端网站建设 来磐石网络惠州seo关键字优化
  • 企业网站设计请示免费发布信息的网站平台
  • 网站建设说课ppt短视频拍摄剪辑培训班
  • 外贸网站建设经验百度知道电脑版网页入口
  • 深圳模板网站建设怎么进行网站推广
  • 网站 主营业务seo入门教程视频
  • wordpress教程全集(入门到精通)如何做好网站推广优化
  • 有哪些做婚礼平面设计的网站有哪些网站关键字优化技巧
  • diango是做网站的后端吗长沙百度网站排名优化
  • jquery 网站缩放百度信息流是什么
  • 怎么在网站备案号码上加一个工信部链接地址泰州网站排名seo
  • 2017 如何做网站优化天津最新消息今天
  • wordpress投票系统孝感seo
  • 评价一个网站的好坏如何注册网站免费注册
  • 政府网站方案书佛山做网站建设
  • 山西做网站优势广州百度推广客服电话多少
  • 温州网站开发定制搜索引擎优化seo的英文全称是
  • 聊城网站开发培训下载安装百度
  • 自己怎样制作公司网站郑州优化网站公司
  • 佛山网站建设外包公司如何设计与制作网页
  • 网站建设工作目标免费刷seo
  • 管理网站模板下载免费下载域名权重是什么意思