<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.0" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">1683-1470</journal-id>
<journal-title-group>
<journal-title>Data Science Journal</journal-title>
</journal-title-group>
<issn pub-type="epub">1683-1470</issn>
<publisher>
<publisher-name>Ubiquity Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5334/dsj-2015-007</article-id>
<article-categories>
<subj-group>
<subject>Proceedings paper</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Research on an Agricultural Knowledge Fusion Method for Big Data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Xie</surname>
<given-names>Nengfu</given-names>
</name>
<email>xienengfu@caas.cn</email>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Wensheng</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ma</surname>
<given-names>Bingxian</given-names>
</name>
<email>ise_mabx@ujn.edu.cn</email>
<xref ref-type="aff" rid="aff-2"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Xuefu</given-names>
</name>
<email>zhangxuefu@caas.cn</email>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sun</surname>
<given-names>Wei</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Guo</surname>
<given-names>Fenglei</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
</contrib-group>
<aff id="aff-1">Key Laboratory of Digital Agricultural Early-Warning Technology, Agricultural Information Institute of Chinese, Academy of Agricultural Sciences, Beijing 100081, China</aff>
<aff id="aff-2">School of Information Science and Engineering, University of Jinan, Jinan 250022, China</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2015-05-22">
<day>22</day>
<month>05</month>
<year>2015</year>
</pub-date>
<volume>14</volume>
<elocation-id>7</elocation-id>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2015 The Author(s)</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License (CC-BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See <uri xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.datascience.codata.org/article/view/dsj-2015-007/"/>
<abstract>
<p>The object of our research is to develop an ontology-based agricultural knowledge fusion method that can be used as a comprehensive basis on which to solve agricultural information inconsistencies, analyze data, and discover new knowledge. A recent survey has provided a detailed comparison of various fusion methods used with Deep Web data (<xref ref-type="bibr" rid="B5">Li, 2013</xref>). In this paper, we propose an effective agricultural ontology-based knowledge fusion method by leveraging recent advances in data fusion, such as the semantic web and big data technologies, that will enhance the identification and fusion of new and existing data sets to make big data analytics more possible. We provide a detailed fusion method that includes agricultural ontology building, fusion rule construction, an evaluation module, etc. Empirical results show that this knowledge fusion method is useful for knowledge discovery.</p>
</abstract>
<kwd-group>
<kwd>Ontology</kwd>
<kwd>Big data</kwd>
<kwd>Agriculture</kwd>
<kwd>Knowledge fusion</kwd>
<kwd>Information integration</kwd>
<kwd>Inconsistency</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>1 Introduction</title>
<p>Currently, most people use the Internet and the World-Wide-Web for browsing and getting information. In fact, however, you cannot obtain the complete, correct, timely information or knowledge that directly affects your judgment and decision-making in the web environment because of the heterogeneity of the information and big data scenarios. Knowledge fusion can be seen as an advanced information integration approach. Information integration focuses on how to find relevant information, but in knowledge fusion this information is merged to create knowledge that is more complete, less uncertain, and less conflicting than the input (<xref ref-type="bibr" rid="B2">Hu, Hu, Sekhari, Peng, &amp; Cao, 2011</xref>). This reduces the cost of data access and enhances the value of the discovered data. The research on web-oriented knowledge fusion theory, methods, and knowledge of tools and development has become an important concern for knowledge-oriented service (<xref ref-type="bibr" rid="B9">Stegmaier, 2010</xref>; <xref ref-type="bibr" rid="B10">Wang, 2009</xref>).</p>
<p>In the 1960s, the international academic community began to research knowledge fusion, but early scholars did not explicitly put forward the concept of knowledge fusion. In the late 1980s, the rise of knowledge engineering increased attention to knowledge fusion. Feigenbaum (<xref ref-type="bibr" rid="B1">1983</xref>) put forward a &#8220;knowledge principle&#8221;, in which knowledge fusion is one of the most important functional modules. Douglas Lenat&#8217;s Cyc project, built upon Feigenbaum&#8217;s knowledge principle, was an artificial intelligence project that attempted to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning (<xref ref-type="bibr" rid="B4">Lenat &amp; Guha, 1990</xref>).</p>
<p>KRAFT (Knowledge Reuse And Fusion/Transformation) aims to develop a combination of database and artificial intelligence technology to allow scientists and engineers to find and exploit knowledge available on the Internet. KRAFT was a close collaboration between universities and industry (<xref ref-type="bibr" rid="B8">Preece, 2001</xref>). Based on KRAFT, knowledge fusion has attracted many researchers. Hunter and Williams (<xref ref-type="bibr" rid="B3">2010</xref>) advocated a knowledge-based approach to merging semi-structured information. They used fusion rules to manage the semi-structured informa&#173;tion that was input for merging. These fusion rules were a form of scripting language that de&#173;fined how structured reports should be merged. The work assumed that structured news reports did not require natural language processing and used fusion rules to han&#173;dle their inconsistencies and uncertainty. Fusionplex was a system for integrating multiple heterogeneous and autonomous information sources that used data fusion to resolve factual inconsistencies among the individual sources. To accomplish this, the system relied on source features, which were metadata, on the merits of each information source (<xref ref-type="bibr" rid="B7">Motro, 2004</xref>). A Dynamic Ontology Construction Method has also been proposed by analyzing knowledge requirements for more effective Knowledge Fusion (<xref ref-type="bibr" rid="B6">Liu, 2014</xref>).</p>
<p>In the next sections, we will discus the agricultural knowledge fusion problem and propose a general architecture for our fusion method. Finally, we will describe the knowledge fusion process in detail.</p>
</sec>
<sec>
<title>2 Related Techological Aspects</title>
<sec>
<title>2.1 Agriculture Big Data Technologies</title>
<p>Agriculture big data means big data concepts, techniques, and methods practiced in the agriculture domain. In addition to having a vast body mass, modal variety and generating fast, low density value, agricultural big data are pervasive, contralateral, and have other characteristics. In the agricultural domain, agricultural production and research generate a large amount of data; in particular the application of information and communications technology (ICT) in agriculture will produce more in-depth, agricultural data soon achieving the ZB level. Integration and future mining of these data used for the development of modern agriculture will play an extremely important role. The big data technologies, including data-processing models and emerging tools, are being developed for implementation of our fusion system.</p>
</sec>
<sec>
<title>2.2 Ontologies</title>
<p>In general, ontology is an explicit specification of conceptualization (<xref ref-type="bibr" rid="B9">Stegmaier et al., 2010</xref>). Nevertheless, the term ontology has been controversial in current AI practice, and so far no formal definition exists. In our work, we have selected to use the term domain-specific ontology (DSO). In practical terms, developing an agricultural ontology (AgriOnto) includes three steps:</p>
<list list-type="bullet">
<list-item>
<p>building a domain-specific knowledge hierarchy;</p>
</list-item>
<list-item>
<p>defining slots of the categories and representing axioms; and</p>
</list-item>
<list-item>
<p>acquiring knowledge, that is to say, filling in the specific data values for slots.</p>
</list-item>
</list>
</sec>
<sec>
<title>2.3 Information Integration and Knowledge Fusion</title>
<p>Knowledge fusion appears naturally, and its related synonym is information integration. In detail, information integration focuses on how to find related information while knowledge fusion focuses on how to find accurate and complete information based on information integration. Therefore, knowledge fusion can be understood as a high-quality integration method, aimed at solving the conflicts of integration-based data; information documents can be integrated to guarantee that information is understandable by machines.</p>
<p>It is well recognized that information integration based on a ranking function has very limited value in selecting the correct value from diverse web resources because inconsistencies exist among information from different agricultural information sources. Our proposed approach is a six-step data flow process based information integration, called primary knowledge fusion (PKF) (Figure <xref ref-type="fig" rid="F1">1</xref>). First, it extracts related information from the PKF through a query. Second, the semantic analysis will be calculated if each piece of information is an instance of a concept of agricultural ontology (Agri-ontology) and the knowledge it contains. The third step annotates each instance according to the ontology. In the fourth step, the instances are clustered into different clusters by instance similarity. Next, the instances are fused according knowledge fusion rules. Finally, the fused result is evaluated and a new knowledge object (KO) produced.</p>
<fig id="F1">
<label>Figure 1</label>
<caption>
<p>The six steps of our approach to knowledge fusion.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/Fig01_web.jpg"/>
</fig>
<p>When multiple agricultural information sources provide inconsistent information, the knowledge fusion method is called upon to produce new information (knowledge) that is complete and accurate.</p>
</sec>
</sec>
<sec>
<title>3 Agricultural Knowledge Fusion Model</title>
<p>The agricultural knowledge fusion method provides integrated knowledge and involves not only delivering available valuable information via links to users but also analyzing and merging the information results from agricultural information sources by solving result consistencies, removing duplicates, etc., based on agricultural domain ontology.</p>
<disp-quote>
<p><bold><italic>Definition 1</italic>:</bold> Given a set of agricultural information sources (AISS), the PKF can be defined as a 3 triple such as PKF = (AISS, M, Q), where:</p>
<list list-type="bullet">
<list-item>
<p>AISS = {Is<sub>1</sub>, Is<sub>2</sub>, &#8230;, IS<sub>n</sub>).</p>
</list-item>
<list-item>
<p>M is the mapping between the global ontology and the ontology of AISS, defined as M = (&#937;, O, g)</p>
<list list-type="bullet">
<list-item>
<p>&#937; = {&#937;<sub>1</sub>, &#937;<sub>2</sub>, &#8230;, &#937;<sub>n</sub>}, &#937;<sub>i</sub> is the ontology of IS<sub>i</sub>.</p>
</list-item>
<list-item>
<p>O is a global ontology.</p>
</list-item>
<list-item>
<p><italic>g</italic>(&#937;<sub>i</sub>) is the mapping relation of &#937;<sub>i</sub> in the O.</p>
</list-item>
</list>
</list-item>
<list-item>
<p><italic>Q</italic> is the user query.</p>
</list-item>
</list>
<p><bold><italic>Definition 2</italic>:</bold> Given PKF = (AISS, M, Q), the agricultural knowledge fusion problem is defined as AKF = (PKF, f, FR), where:</p>
<list list-type="bullet">
<list-item>
<p>PKF is the primary knowledge fusion.</p>
</list-item>
<list-item>
<p>f is the operating function as f(PKF) = {&#969;<sub>1</sub>, &#969;<sub>2</sub>, &#8230;, &#969;<sub>n</sub>), and &#969;<sub>i</sub> is the information instance anno&#173;tated by the ontology.</p>
</list-item>
<list-item>
<p>FR = {fr<sub>1</sub>, fr<sub>2</sub>, &#8230;, fr<sub>n</sub>} is a set of knowledge fusion rules for attributes in agricultural ontology.</p>
</list-item>
</list>
<p><bold><italic>Definition 3</italic>:</bold> Given AKF = (PKF, f, FR), the solution K satisfies:</p>
<list list-type="bullet">
<list-item>
<p>&#8704;s&#8712;slot(O), if &#8707;fr&#8712;FR, then K&#183;s = fr(s,&#969;).</p>
</list-item>
<list-item>
<p>K |= Q means K is the answer to the Q. In this paper, K is the knowledge object and is described as K = (K_Name, ((s<sub>1</sub>, v<sub>1</sub>), (s<sub>2</sub>, v<sub>2</sub>), &#8230;, (s<sub>n</sub>, v<sub>n</sub>))). We call (s<sub>i</sub>, v<sub>i</sub>) a knowledge unit. s<sub>i</sub> is the slot attribute of a concept in ontology, and v<sub>i</sub> is the value of s<sub>i</sub> of an instance.</p>
</list-item>
</list>
</disp-quote>
<p>The above illustrates the agricultural knowledge fusion model in detail and gives a formal description of how to find a solution that merges the information from multi agricultural information sources into consistent knowledge that will answer users&#8217; queries.</p>
</sec>
<sec>
<title>4 Agricultural Knowlege Fusion Architecture</title>
<p>In this paper, we propose a general agri-ontology-based knowledge fusion architecture as shown in Figure <xref ref-type="fig" rid="F2">2</xref>. The architecture consists of three main aspects: 1) agricultural ontology and fusion rules are the cornerstones of the convergence of agricultural knowledge; 2) agricultural ontology-based knowledge representation and matching, as well as mining and automatically selecting fusion rules based on the property of concept, are the key components in knowledge fusion; 3) in order to find more accurate knowledge to satisfy users&#8217; queries, assessment of the fusion results is necessary to enhance knowledge fusion. All these parts form a complete system of knowledge fusion.</p>
<fig id="F2">
<label>Figure 2</label>
<caption>
<p>The agricultural knowledge fusion architecture.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/Fig02_web.jpg"/>
</fig>
<sec>
<title>4.1 AgriOnto</title>
<p>AgriOnto is the formal definition of agriculture and its relationships (see Figure <xref ref-type="fig" rid="F3">3</xref>). The definition and relationships form an integrated hierarchy of agriculture. With the labor object as the center of the agriculture hierarchy, we divide agriculture knowledge into seven taxa: labor object, production process, production technology, agriculture engineering, agriculture branch, agriculture environment, and agriculture regulation. Putting the labor object as the center of the agriculture knowledge hierarchy aims to aid those users who want labor object knowledge to access related knowledge of other taxa.</p>
<fig id="F3">
<label>Figure 3</label>
<caption>
<p>Some parts of the agriculture knowledge hierarchy. A favorable hierarchical hierarchy of agriculture knowledge is very useful to building an agriculture ontology. Our AgriOnto is built on this hierarchical structure.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/Fig03_web.jpg"/>
</fig>
</sec>
<sec>
<title>4.2 Fusion Rules</title>
<p>Each fusion rule, such as Min, Max, and Avg, can be looked as an aggregation function in the database (<xref ref-type="bibr" rid="B13">Xie et al., 2012</xref>). We divide fusion rules into two types: the single data fusion rule and the multi data fusion rule.</p>
<disp-quote>
<p><bold><italic>Definition 4</italic>:</bold> The single data fusion rule (SFR) is a type of aggregation function such that:</p>
<p>
<disp-formula>
<alternatives>
<mml:math id="Eq001-mml">
<mml:mrow>
<mml:mi>f</mml:mi><mml:mo>:</mml:mo><mml:mi>D</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x2192;</mml:mo><mml:mi>D</mml:mi>
</mml:mrow>
</mml:math>
<tex-math id="M1">
    \documentclass[10pt]{article}
    \usepackage{wasysym}
    \usepackage[substack]{amsmath}
    \usepackage{amsfonts}
    \usepackage{amssymb}
    \usepackage{amsbsy}
    \usepackage[mathscr]{eucal}
    \usepackage{mathrsfs}
    \usepackage{pmc}
    \usepackage[Euler]{upgreek}
    \pagestyle{empty}
    \oddsidemargin -1.0in
    \begin{document}
    \[
  f:D1 \times D2 \times {\rm{ }},{\rm{ }}.{\rm{ }}.{\rm{ }}.{\rm{ }},{\rm{ }} \times Dn \to D
    \]
    \end{document}
</tex-math>
<graphic xlink:href="eqn/e001.gif"/>
</alternatives>
</disp-formula>
</p>
<p>where D<sub>i</sub> is the value domain that has been unified as a domain so D<sub>1</sub> = D<sub>2</sub> = , &#8230;, = D<sub>n</sub>. Given v<sub>i</sub>&#8712;D (i = 1,2, &#8230;, n), &#402;(v<sub>1</sub>, v<sub>2</sub>, &#8230;, v<sub>n</sub>) = v, v&#8712;D. In this paper, the SFR includes Majr(Majority rule), Max, Min, Avg, Minr (Min-Priority rule), etc.</p>
<p><bold><italic>Definition 5</italic>:</bold> The multi data fusion rule (MFR) is a type of aggregation function such that:</p>
<p>
<disp-formula>
<alternatives>
<mml:math id="Eq002-mml">
<mml:mrow>
<mml:mi>f</mml:mi><mml:mo>:</mml:mo><mml:msub>
<mml:mi>D</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>&#x00D7;</mml:mo><mml:msub>
<mml:mi>D</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>&#x00D7;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>&#x00D7;</mml:mo><mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>&#x2192;</mml:mo><mml:msup>
<mml:mn>2</mml:mn>
<mml:mi>D</mml:mi>
</mml:msup>
</mml:mrow>
</mml:math>
<tex-math id="M2">
    \documentclass[10pt]{article}
    \usepackage{wasysym}
    \usepackage[substack]{amsmath}
    \usepackage{amsfonts}
    \usepackage{amssymb}
    \usepackage{amsbsy}
    \usepackage[mathscr]{eucal}
    \usepackage{mathrsfs}
    \usepackage{pmc}
    \usepackage[Euler]{upgreek}
    \pagestyle{empty}
    \oddsidemargin -1.0in
    \begin{document}
    \[
  f:{D_1} \times {D_2} \times {\rm{ }}.{\rm{ }}.{\rm{ }}.{\rm{ }} \times {D_n} \to {2^D}
    \]
    \end{document}
</tex-math>
<graphic xlink:href="eqn/e002.gif"/>
</alternatives>
</disp-formula>
</p>
<p>given v<sub>i</sub>&#8712;D(i = 1, 2, &#8230;, n), &#402;(<italic>v<sub>1</sub>, v<sub>2</sub></italic>, &#8230;, <italic>v<sub>n</sub></italic>) = D&#8242;, v<sub>i</sub>&#8712;D, D&#8242;&#8838; D. The MFR includes CInt (Interval Rule), Or, and And.</p>
</disp-quote>
<p>In general, the single data fusion rule and the multi data fusion rule cannot be applied to an information set. Instead, we must analyze the query and answer type and then define a combination of fusion rules. However, usually the user participates in the rules selection to finish the knowledge fusion process. We have defined 13 fusion operator rules based on global ontology. For example, a closed interval operator is a fusion operator whose definition is as follows:</p>
<disp-quote>
<p><bold><italic>Definition 6</italic>:</bold> Given a domain D and possible values on it D&#8242; = {v<sub>1</sub>&#8242;, v<sub>2</sub>&#8242;, &#8230; v<sub>n</sub>&#8242;}, the closed interval operator(CInt) satisfies:</p>
<p>
<disp-formula>
<alternatives>
<mml:math id="Eq003-mml">
<mml:mrow>
<mml:mi>C</mml:mi><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msup>
<mml:mi>D</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo stretchy='false'>]</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x00A0;if&#x00A0;</mml:mtext><mml:mo>&#x2200;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:msub>
<mml:msup>
<mml:mi>v</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo><mml:msup>
<mml:mi>D</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mo>,</mml:mo><mml:mtext>&#x00A0;then&#x00A0;</mml:mtext><mml:msub>
<mml:msup>
<mml:mi>v</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>&#x2208;</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:msub>
<mml:mi>v</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo stretchy='false'>]</mml:mo>
</mml:mrow>
</mml:math>
<tex-math id="M3">
    \documentclass[10pt]{article}
    \usepackage{wasysym}
    \usepackage[substack]{amsmath}
    \usepackage{amsfonts}
    \usepackage{amssymb}
    \usepackage{amsbsy}
    \usepackage[mathscr]{eucal}
    \usepackage{mathrsfs}
    \usepackage{pmc}
    \usepackage[Euler]{upgreek}
    \pagestyle{empty}
    \oddsidemargin -1.0in
    \begin{document}
    \[
  CInt(D') = [{v_i},{\rm{ }}{v_j}],{\rm{ if }}\forall {\rm{ }}{v'_i} \in D',{\rm{ then }}{v'_i} \in [{v_i},{\rm{ }}{v_j}]
    \]
    \end{document}
</tex-math>
<graphic xlink:href="eqn/e003.gif"/>
</alternatives>
</disp-formula>
</p>
<p><bold><italic>Example 1</italic>:</bold> If there exist three possible tuples: v<sub>1</sub>= (Wang da hong; age; 12), v<sub>2</sub> = (Wang da hong; age; 13), and v<sub>3</sub> = (Wang da hong; age; 15), then we will get CInt ({v<sub>2</sub>, v<sub>2</sub>, v<sub>3</sub>}) = (Wang da hong; age; [12&#8211;15]).</p>
</disp-quote>
<p>In our Fusion rule selection, each rule will be limited to some condition that can be deduced by a rule character and a query that can be defined:</p>
<disp-quote>
<p><bold><italic>Definition 7</italic>:</bold> Given query ontology &#937;, a knowledge fusion query can be formally defined:</p>
<p>
<disp-formula>
<alternatives>
<mml:math id="Eq004-mml">
<mml:mrow>
<mml:mi>o</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x007B;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo><mml:mi>&#x0020;</mml:mi><mml:mi>f</mml:mi><mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>?</mml:mo><mml:mo>,</mml:mo><mml:mn>&#8230;</mml:mn><mml:mo>,</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo><mml:mi>&#x0020;</mml:mi><mml:mi>f</mml:mi><mml:msub>
<mml:mi>r</mml:mi>
<mml:mi>n</mml:mi>
</mml:msub>
<mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>?</mml:mo><mml:mo>&#x007D;</mml:mo><mml:mo>&#x007C;</mml:mo><mml:mi>c</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x0020;</mml:mi><mml:mi>o</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x007B;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo><mml:mi>&#x0020;</mml:mi><mml:mi>f</mml:mi><mml:msub>
<mml:mi>r</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mo>?</mml:mo><mml:mo>,</mml:mo><mml:mn>&#8230;</mml:mn><mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
<tex-math id="M4">
    \documentclass[10pt]{article}
    \usepackage{wasysym}
    \usepackage[substack]{amsmath}
    \usepackage{amsfonts}
    \usepackage{amssymb}
    \usepackage{amsbsy}
    \usepackage[mathscr]{eucal}
    \usepackage{mathrsfs}
    \usepackage{pmc}
    \usepackage[Euler]{upgreek}
    \pagestyle{empty}
    \oddsidemargin -1.0in
    \begin{document}
    \[
  o \cdot \{ ({s_1}, f{r_1}) = ?,...,({s_n}, f{r_n}) = ?\} |cnt, o \cdot \{ ({s_1}, f{r_1}) = ?,...,
    \]
    \end{document}
</tex-math>
<graphic xlink:href="eqn/e004.gif"/>
</alternatives>
</disp-formula>
</p>
<p>where (s<sub>n</sub>, fr<sub>n</sub>) = ?} represents query objects, and cnt is a set of constraint conditions. O is a concept or instance in &#937;, s<sub>1</sub> is a slot (attribute) of o, and fr<sub>1</sub> is a fusion rule. If fr<sub>1</sub> is omitted, the query will be changed into a general query in traditional information integration.</p>
<p><bold><italic>Example 2</italic>:</bold> Given a query = Potato &#183; (price, Avg), the knowledge fusion system should provide an average price of a price set of potatoes returned by informa&#173;tion integration. If Avg is NULL, then the knowledge fusion system will return the potato price in a way similar to traditional information integration. Often a user can select a rule according his preference.</p>
</disp-quote>
<p>In query ontology &#937;, we define a default rule for each slot of a concept, involv&#173;ing two slot types: meta-slot and composite-slot. A meta-slot is a slot that cannot be divided semantically while a composite-slot can be divided into many meta-slots. For example, slot IdentityNo of a concept person is a meta-slot, but Name, usually, is a composite-slot including a meta-slot first-name and a meta-slot last name. A fusion rule for meta-slot is always pre-defined according to the meta-slot definition, but a com&#173;posite-slot usually needs a concatenate rule. In order to acquire a high quality answer, we need to extend the slots of a concept to filter out useless information. The slots also are called data quality slots including:</p>
<list list-type="bullet">
<list-item>
<p><bold>Authority (DQa)</bold> The data quality authority is used to measure the probability of information correctness in information sources.</p>
</list-item>
<list-item>
<p><bold>Timeliness (DQt)</bold> Timeliness presents a means to estimate the goodness (or badness) of information in information sources in terms of time.</p>
</list-item>
<list-item>
<p><bold>Completeness(DQc)</bold> The degree to which all data relevant to an application domain have been recorded in an information source.</p>
</list-item>
</list>
<p>Therefore, given a concept and its slot set {a<sub>1</sub>, a<sub>2</sub>, &#8230;, a<sub>n</sub>}, the extensional slot set will be {a<sub>1</sub>, a<sub>2</sub>, &#8230;, a<sub>n</sub>, DQ<sub>a</sub>, DQ<sub>t</sub>, DQ<sub>c</sub>}.</p>
</sec>
<sec>
<title>4.3 Knowledge Inconsistency Problem Analysis</title>
<p>In general, knowledge consistency means a judgment is in accord with both historical judgments and current facts. On the other hand, inconsistency means a contradiction between the historical judgments and current facts. From the aspect of ontology, consistency means that the logic relationships of the terminology are consistent while inconsistency means conflicts exist between some parts of the ontologies. For example, we define grain crops and cash crops as disjoint classes that do not have the same instances. If the class wheat belongs to both grain crops and cash crops, an inconsistency will occur.</p>
<p>In this paper, agricultural ontology consistency includes consistency between the ontology definition and the knowledge based on the ontology. This means that we cannot obtain conflicting knowledge from the knowledge base. Generally, when a knowledge base exists, conflict knowledge depends on the following conditions:</p>
<list list-type="order">
<list-item>
<p>The consistency of concept defining. That is to say, the formal definition contains the same meaning as the informal one. Take the concept &#8220;dogs&#8221; as an example. If the formal definition of dogs is the same as that of the concept cats, inconsistency exists.</p>
</list-item>
<list-item>
<p>The consistency of concept extension. In terms of formal or non-formal concept definitions, conflict knowledge can exist through concept explanation (including reasoning). For example, cats can catch mice, but we cannot say that mice can catch cats.</p>
</list-item>
<list-item>
<p>The consistency of axioms. The axiom system will not produce conflict knowledge.</p>
</list-item>
</list>
<p>From the viewpoint of knowledge application, the knowledge base can guide users to make correct decisions and ensure that no confusing conclusions arise. In brief, consistency is an important criterion with which to evaluate an ontology-based knowledge base. Knowledge inconsistency will lead to unreliable service, which threatens knowledge correctness. This paper proposes a method with which to check ontology consistency.</p>
<disp-quote>
<p><bold><italic>Definition 8</italic>:</bold> Given knowledge base K, the knowledge inconsistency problem is a 3 triple KI = (K, Y, Q), which satisfies :</p>
<list list-type="bullet">
<list-item>
<p>Y = {y1, y2, &#8230;, yn} is a knowledge operation set.</p>
</list-item>
<list-item>
<p>Q is a given knowledge query.</p>
</list-item>
</list>
<p><bold><italic>Definition 9</italic>:</bold> Given knowledge inconsistency problem KI = (K, Y, Q). If a knowledge conflict exists in K, it satisfies the following conditions:</p>
<list list-type="bullet">
<list-item>
<p>&#8707; k, k<sub>11</sub>, k<sub>22</sub>, &#8230;, k<sub>1j</sub> &#8712; K, y<sub>11</sub>, y<sub>12</sub>, &#8230;, y<sub>1j</sub> &#8712; Y, <inline-formula>
<alternatives>
<mml:math id="Eq005-mml">
<mml:mrow>
<mml:mstyle displaystyle='true'>
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mi>j</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn><mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy='false'>(</mml:mo><mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mn>1</mml:mn><mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy='false'>)</mml:mo>
</mml:mrow>
</mml:mstyle><mml:mo>&#x007C;</mml:mo><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x2227;</mml:mo><mml:mtext>k</mml:mtext><mml:mo>&#x2192;</mml:mo><mml:mtext>Q</mml:mtext>
</mml:mrow>
</mml:math>
<tex-math id="M5">
    \documentclass[10pt]{article}
    \usepackage{wasysym}
    \usepackage[substack]{amsmath}
    \usepackage{amsfonts}
    \usepackage{amssymb}
    \usepackage{amsbsy}
    \usepackage[mathscr]{eucal}
    \usepackage{mathrsfs}
    \usepackage{pmc}
    \usepackage[Euler]{upgreek}
    \pagestyle{empty}
    \oddsidemargin -1.0in
    \begin{document}
    \[
    \sum\limits_1^j {{y_{1l}}({k_{1l}})} | = k \wedge {\rm{k}} \to {\rm{Q}}
    \]
    \end{document}
</tex-math>
<graphic xlink:href="eqn/e004a.gif"/>
</alternatives>
</inline-formula>. The symbol |= indicates &#8220;reason out&#8221; and &#8594; represents &#8220;can satisfy&#8221;.</p>
</list-item>
<list-item>
<p>&#8707; k, k<sub>11</sub>, k<sub>22</sub>, &#8230;, k<sub>1m</sub> &#8712; K, y<sub>21</sub>, y<sub>22</sub>, &#8230;, y<sub>2j</sub> &#8712; Y, <inline-formula>
<alternatives>
<mml:math id="Eq006-mml">
<mml:mrow>
<mml:mstyle displaystyle='true'>
<mml:munderover>
<mml:mo>&#x2211;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mi>j</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn><mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy='false'>(</mml:mo><mml:msub>
<mml:mi>k</mml:mi>
<mml:mrow>
<mml:mn>2</mml:mn><mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo stretchy='false'>)</mml:mo>
</mml:mrow>
</mml:mstyle><mml:mo>&#x007C;</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x00AC;</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x2227;</mml:mo><mml:mo>&#x00AC;</mml:mo><mml:mtext>k</mml:mtext><mml:mo>&#x2192;</mml:mo><mml:mtext>Q</mml:mtext>
</mml:mrow>
</mml:math>
<tex-math id="M6">
    \documentclass[10pt]{article}
    \usepackage{wasysym}
    \usepackage[substack]{amsmath}
    \usepackage{amsfonts}
    \usepackage{amssymb}
    \usepackage{amsbsy}
    \usepackage[mathscr]{eucal}
    \usepackage{mathrsfs}
    \usepackage{pmc}
    \usepackage[Euler]{upgreek}
    \pagestyle{empty}
    \oddsidemargin -1.0in
    \begin{document}
    \[
\sum\limits_1^j {{y_{2l}}({k_{2l}})} | = \neg k \wedge \neg {\rm{k}} \to {\rm{Q}}
    \]
    \end{document}
</tex-math>
<graphic xlink:href="eqn/e004b.gif"/>
</alternatives>
</inline-formula>.</p>
</list-item>
</list>
</disp-quote>
<p>From the above definitions, we see that the knowledge base has inconsistency if there are two pieces of contradictory knowledge. It is very import to find a mechanism or method to check this knowledge base inconsistency (<xref ref-type="bibr" rid="B11">Xie, 2012</xref>).</p>
</sec>
</sec>
<sec>
<title>5 Agrionto-Based Knowledge Fusion</title>
<sec>
<title>5.1 Equivalent Entity Distinguishing</title>
<p>Equivalent entity distinguishing uses a clustering algorithm to classify the same entities into categories using identity slots (IS); that is to say, if IS(entiy1) = IS(entiy2), then entiy1is equivalent to entiy2, from the entity viewpoint (entiy1 &#8776; entiy2). We also think that the two entities have different descriptions of an object. From the equivalent entity definition, we can conclude the following propositions: Proposition1: if E1 &#8776; E2 &#8743; E2 &#8776; E3, then E1 &#8776; E3; Proposition2: if E1 &#8776; E2 &#8743; E2 &#8800; E3, then E1 &#8800; E3; Proposition3: if E1 &#8776; E2, then E2 &#8776; E1</p>
<p>In order to determine whether two entities are equivalent, we need to analyze the identity slots&#8217; values:</p>
<list list-type="bullet">
<list-item>
<p><bold>Abbreviation.</bold> An abbreviation is a shorter way to say something, for example, Massachusetts = Mass.</p>
</list-item>
<list-item>
<p><bold>Synonym.</bold> Given two words that are synonyms, they represent the same entity or concept, for instance, corn and maize.</p>
</list-item>
<list-item>
<p><bold>Prefix &amp; Suffix.</bold> An abbreviation using the first or last letter of each word, for example, IM = Instant Messaging.</p>
</list-item>
</list>
<p>If data in the identity slot are pre-processed and IS(entiy1) = IS(entiy2), then entiy1 &#8776; entiy2.</p>
</sec>
<sec>
<title>5.2 Fusion Method</title>
<p>In our research, we define fusion rules at attribute granularity. Each fusion rule can be looked at as an aggregation function in the database, such as Min, Max, and Avg. In general, single data fusion rules and multi data fusion rules cannot be applied to an information set. Instead, we need to analyze the query and answer type and then define the necessary combination of fusion rules. Usually, however, a user needs to participate in rule selection to finish the knowledge fusion process. Generally, the attribute constraint determines the rule selection that is affected by the query. We divide knowledge fusion into attribute fusion, instance fusion, and concept fusion.</p>
<p>&#8226; <bold>Attribute fusion</bold></p>
<p>Attribute fusion merges the different values at an attribute, for example (see Figure <xref ref-type="fig" rid="F4">4</xref>), &#8220;What price is the wheat at market 1?&#8221; The information fragments of two equivalent instances are extracted from information sources. In this case, the two values of the price are inconsistent so the last fused price will be &#8220;1.925&#165;/kilo&#8221; using the Avg rule. This is especially useful when the price value is an editing error.</p>
<fig id="F4">
<label>Figure 4</label>
<caption>
<p>The extracted information fragments of two instances.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/Fig04_web.jpg"/>
</fig>
<p>&#8226; <bold>Instance fusion</bold></p>
<p>Instance fusion merges equivalent instances that have different descriptions of the same object (see Figure <xref ref-type="fig" rid="F5">5</xref>). Because most information sources describe a part of an object, the fused result is the union of the equivalent instances based on the attribute fusion.</p>
<fig id="F5">
<label>Figure 5</label>
<caption>
<p>The instance fusion process.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/Fig05_web.jpg"/>
</fig>
<p>&#8226; <bold>Concept fusion</bold></p>
<p>Concept fusion takes into account the correlations among equivalent instances by combining different instances that are divided into different sets of equivalent instances by the cluster algorithm (see Figure <xref ref-type="fig" rid="F6">6</xref>).</p>
<fig id="F6">
<label>Figure 6</label>
<caption>
<p>The concept fusion process.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/Fig06_web.jpg"/>
</fig>
</sec>
</sec>
<sec>
<title>6 Conclusion</title>
<p>Data have become strategic resources as important as natural resources and human resources with an implied great value and have caught the attention of both the scientific and business communities. With the recent rapid growth in the amount of data, existing data processing technology has great difficulty in meeting the large demand placed on it, and the data are very difficult to mine. In this paper, we propose a generic agricultural knowledge fusion method to fuse information from diverse information sources, such that a more comprehensive basis can be obtained for data analysis and knowledge discovery for agricultural big data. In recent years, information systems integration or business integration have received much attention (<xref ref-type="bibr" rid="B12">Xie &amp; Wang, 2010</xref>; <xref ref-type="bibr" rid="B11">Xie, 2012</xref>). Now we must pay attention to the integration of agricultural data in the area of big data because once the data are gathered and stored in an integrated database, they will have new value. This paper describes how to make full use of agriculturual information from the aspect of knowledge fusion technology, which will accelerate the correct use of agricultural knowledge and give a knowledge basis for big data mining. In the fu&#173;ture, we will further study data consistency, ontology-based rules, and fusion algorithms and conduct more application tests under the open agricultural big data environment.</p>
</sec>
</body>
<back>
<ack>
<title>7 Acknowledgments</title>
<p>This work was supported by key projects of the Ministry of Agriculture on the cultivation of new varieties of genetically modified organisms (No. 2014ZX0801101B) and CAAS Agricultural Science and Technology Innovation Program.</p>
</ack>
<ref-list>
<ref id="B1">
<label>1</label>
<mixed-citation>Feigenbaum, E. &amp; McCorduck, P. (1983) The fifth generation: artificial intelligence and Japan&#8217;s computer challenge to the world. Reading: Addison-Wesley Publishing Company.</mixed-citation>
</ref>
<ref id="B2">
<label>2</label>
<mixed-citation>Hu, X., Hu, J., Sekhari, A., Peng, Y.H., &amp; Cao, Zh.M. (2011) A Fuzzy Knowledge Fusion Framework for Terms Conflict Resolution in Concurrent Engineering. <italic>Concurrent Engineering: R&amp;A (CERA) 19</italic>(1), pp 71&#8211;84.</mixed-citation>
</ref>
<ref id="B3">
<label>3</label>
<mixed-citation>Hunter, A. &amp; Williams, M. (2010) Qualitative Evidence Aggregation using Argumentation. <italic>COMMA 2010</italic>, pp 287&#8211;298.</mixed-citation>
</ref>
<ref id="B4">
<label>4</label>
<mixed-citation>Lenat, D.B. &amp; Guha, R.V. (1990) <italic>Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project</italic>. Addison-Wesley Publishing Company, Inc.: CA.</mixed-citation>
</ref>
<ref id="B5">
<label>5</label>
<mixed-citation>Li, X., Dong, X. L., Lyons, K. B., Meng, W., &amp; Srivastava, D. (2013) Truth finding on the deep web: Is the problem solved. <italic>PVLDB 6</italic>(2).</mixed-citation>
</ref>
<ref id="B6">
<label>6</label>
<mixed-citation>Liu, J., Xu, W., &amp; Jiang, H. (2014) Research on Dynamic Ontology Construction Method for Knowledge Fusion in Group Corporation. <italic>Advances in Intelligent Systems and Computing 278</italic>, pp 289&#8211;298.</mixed-citation>
</ref>
<ref id="B7">
<label>7</label>
<mixed-citation>Motro, A. &amp; Anokhin, P. (2004) Utility-based Resolution of Data Inconsistencies. In <italic>Proc. International Workshop on Information Quality in Information Systems 2004</italic>, Paris, France, pp 35&#8211;43.</mixed-citation>
</ref>
<ref id="B8">
<label>8</label>
<mixed-citation>Preece, A., Hui, K., Gray, W., &amp; Marti, P. (2001) Designing for Scalability in a Fusion System. <italic>Knowledge Based Systems, 14</italic>(3-4), pp 173&#8211;179.</mixed-citation>
</ref>
<ref id="B9">
<label>9</label>
<mixed-citation>Stegmaier, F., B&#252;rger, T., D&#246;ller, M., &amp; Kosch, H. (2010) Knowledge Based Multimodal Result Fusion for Distributed and Heterogeneous Multimedia Environments: Concept and Ideas. <italic>Adaptive Multimedia Retrieval</italic>, pp 61&#8211;73.</mixed-citation>
</ref>
<ref id="B10">
<label>10</label>
<mixed-citation>Wang, C.Y., Hu, B., &amp; Li, P. (2009) Empirical Study of Knowledge Fusion Process within Chinese High-Tech Industry Clusters Based on Information Fusion Method. <italic>JIKM 8</italic>(4), pp 353&#8211;361.</mixed-citation>
</ref>
<ref id="B11">
<label>11</label>
<mixed-citation>Xie, N. (2012) Research on the Inconsistency Checking in Agricultural Knowledge Base. In <italic>Proc. CCTA (1)</italic>, pp 290&#8211;296.</mixed-citation>
</ref>
<ref id="B12">
<label>12</label>
<mixed-citation>Xie, N. &amp; Wang, W. (2010) Research on 3G Technologies-Based Agricultural Information Resource Integration and Service. <italic>CCTA 2009</italic>, pp 114&#8211;120.</mixed-citation>
</ref>
<ref id="B13">
<label>13</label>
<mixed-citation>Xie, N., Wang, W., Yang, X., &amp; Jiang, L. (2012) Rule-Based Agricultural Knowledge Fusion in Web Information Integration. <italic>SENSOR LETTERS 10</italic>, pp 1&#8211;4.</mixed-citation>
</ref>
</ref-list>
</back>
</article>