Academic Editor

Article Metrics

  • Fig. 1.

    View in Article
    Full Image
  • Fig. 2.

    View in Article
    Full Image
  • Fig. 3.

    View in Article
    Full Image
  • Fig. 4.

    View in Article
    Full Image
  • Fig. 5.

    View in Article
    Full Image
  • Fig. 6.

    View in Article
    Full Image
  • Fig. 7.

    View in Article
    Full Image
  • Information

  • Download

  • Contents

Abstract

Researchers have been working on different aspects of representing scientific research papers as machine-processable knowledge graphs. This study extends that work to the domain of social science research papers—specifically aiming to construct scholarly knowledge graphs that represent research results together with their supporting arguments, showing how research claims are incrementally constructed following the discourse structure of the paper. The study analyzed 30 sociology research papers that aimed to find or confirm cause-effect relationships between concepts. It analyzed the causal structure of the research result statements in the papers and investigated what additional information is provided by other statements in the papers performing the argument/rhetorical functions of general statement, topic centrality, literature review, research gap, research objective, research method, research result, and research contribution. A cause-effect information frame that lists all possible roles in a causal information structure was used to guide the analysis. We show that a scholarly knowledge graph representing the research results in a social science research paper must include information extracted from multiple statements in the paper, to provide a comprehensive view of the constructed knowledge. Together, these statements, taken in sequence, form the argument flow of the paper, showing how knowledge is constructed. The study also implemented a prototype graph visualization application to help users examine the causal structure of the research result statements and how that structure is incrementally constructed when the information content of the other statements is viewed in sequence. This tool enables readers to explore conflicting or unexpected results, trace how cause-effect relationships are generalized or specialized, and understand the overall causal information structure accumulated across the paper.

1. Introduction

Scientific and social science research results are reported mainly in research papers published as journal articles and conference papers. There is an emerging research and development (R&D) effort to represent scientific research results and processes in ontologies and knowledge graphs that are machine-processable (Gutierrez and Sequeda, 2020; Ehrlinger and Wöß, 2016)—to support semantic information retrieval, generation of literature overviews, and updating of systematic reviews (Brack et al, 2020; Slaughter et al, 2015). This study contributes to extending this effort to social science research. Zou (2020) provided a broad survey of the potential applications of knowledge graphs in general.

A knowledge graph represents information or knowledge as a graph (i.e., network) structure of nodes and directed edges (links) with the nodes representing classes and entities (i.e., class instances) and the edges representing relationship types. The nodes or edges may be associated with properties or attributes (referred to as DatatypeProperty in the Web Ontology Language (OWL) 2) (World Wide Web Consortium, 2012). Knowledge graph is often considered to be synonymous with ontology in the literature (Bergman, 2019; Hogan et al, 2021; Ehrlinger and Wöß, 2016), and often represented using the Resource Description Framework (RDF) (World Wide Web Consortium, 2014) and OWL2 formalisms. However, knowledge graphs are increasingly being stored in graph database management systems using the built-in database schema of the systems for ease of retrieval and processing. Khoo et al (2024) proposed that the term ontology be reserved for formal representations (e.g., using RDF/OWL2) that support inferencing by computer programs and artificial intelligence (AI) agents, and knowledge graph be used for representations oriented towards supporting human information seeking and use.

The question we address in this study is what should be included in a knowledge graph representing the research results of a social science study, as reported in a research paper. Previous research in scientific knowledge graph construction has focused on extracting concepts and relations from the research abstracts (e.g., Rossanez et al, 2020; Dessì et al, 2021) as research abstracts are readily available on the Web and are short and concise. However, abstracts leave out many details of the research results as well as the supporting arguments and evidence. This is especially so for social science research papers, where the main research result may be modified by moderator variables, or the effect may vary for different subclasses of the main concept. Norouzi et al (2025, p. 140) pointed out that social science research papers use more “nuanced and ambiguous language when discussing causality, [with] the tendency to employ hedging or tentative statements about causal relationships, and the statement of causal claims as implicit rather than explicit”. They found sentences in social science papers tend to be longer and have more complex syntactic structure. They concluded that it was important to analyze causal statements across the fulltext of a paper from introduction to conclusion. However, knowledge graph construction based on the fulltext of research papers raises the question of what to select from the text to represent in the knowledge graph. One approach is to identify all the research result statements in the paper, and then the main supporting arguments and evidence for the research result claims. In addition, the knowledge graph can trace how the research claims are incrementally constructed in the “argument flow” of the research paper—from the introductory general statement, assertion of topic centrality, literature review summary, research gap identification, research objective statement, to the research result claim and contribution statements.

In this study, we manually identified the summary research result statements from the Discussion, Conclusion or Research Results section of sample sociology research papers. However, a summary research result statement taken out of context does not give a complete “picture” or understanding of the knowledge constructed in the paper. Other statements in the paper are needed to “flesh out” the summary result and indicate how the knowledge is constructed.

It is well-known that different statements in a research paper have different rhetorical and argumentative functions. Researchers in applied linguistics (especially genre studies) have analyzed the rhetorical structure of research papers following Swales’ (1990) Create A Research Space (CARS) framework. They have identified rhetorical functions such as topic centrality, topic generalization, research gap, research objective, research contribution, etc., (e.g., Kanoksilapatham, 2005; Kathpalia and Khoo, 2020). Cheng and Khoo (2022) adopted these terms and re-framed them as types of argumentative functions. Topic generalization (or general statement) and literature review statements indicate what is general knowledge, established knowledge, or pre-conceived assumptions—to be contrasted with the research result. Topic centrality and research gap statements indicate which part of the research result is of particular interest. The research objective statement often parallels the research result in the overall information structure, but the research result may indicate unexpected results, and often provide more specific information and more details. We propose that a knowledge graph representing the research results in a social science research paper needs to include information extracted from multiple statements in the paper—to provide a comprehensive view of the constructed knowledge. These statements taken in sequence represent the argument flow of the paper, showing how knowledge is constructed.

To sum up, this study sought to find out:

• What kinds of cause-effect information (i.e., which roles in the causal information structure) are often found in statements representing different argument/rhetorical functions (e.g., research gap and research objective), and what kinds of changes in the causal information structure occur from one cause-effect statement to the next?

• Comparing research result statements with the corresponding research objective statements, what additional information does the research objective statements provide?

• A general statement or topic centrality statement often occurs at the beginning of the Introduction section. Which roles in the causal information structure do these statements highlight as being important in the study?

• How should the information (i.e., causal roles) in cause-effect statements and the changes in causal information structure from one statement to the next be presented visually to users?

The focus of our analysis was on sociology research that investigated cause-effect relationships—to construct causal knowledge. We analyzed 30 sociology research papers that investigated cause-effect relationships, examining the causal information structure of the summary research result statements (usually found at the beginning of the Conclusion or Discussion section, or the end of the Research Results section of a paper). We then analyzed what additional information is provided by selected statements in the Introduction and Conclusion sections of the paper: general statement, topic centrality, literature review summary, research gap, research objective, research method summary, research result summary and research contribution statements. We implemented a prototype graph visualization application to help users to examine the causal structure of the research result statement, and how the causal structure is incrementally constructed when the statements are viewed in sequence. Though the scope of the analysis is sociology research papers, the results are expected to apply more generally to social science research papers.

2. Literature Review

The literature on scientific knowledge graphs (and scholarly knowledge graphs for social science research) have focused on four issues: the knowledge graph schema design (i.e., ontology design) including the classes, relationship types and class/relation attributes used; automatic construction of the knowledge graph by performing entity and relation extraction on the scientific text; knowledge graph completion methods especially for predicting relations between existing entities; and graph embedding methods to represent relations and other graph structures as vectors for ease of retrieval, comparison and other applications. Most papers addressed two or more of these issues as they are interrelated. This literature review focuses on knowledge graph schema design for representing cause-effect statements in research papers together with their supporting arguments and evidence. This can be referred to as causal argument knowledge graphs. Previous research (e.g., Luan et al, 2018) in scientific knowledge graph construction has focused on research abstracts as these are easier to obtain and process, but some studies (Binder et al, 2022; Jain et al, 2020) have examined the issues of representing the argument structure of the whole research paper, which is the end goal of this study..

Different studies have represented different degrees of detail of the cause-effect information. Some studies simply identified sentences that contained cause-effect information (e.g., Norouzi et al, 2025) or extracted just the text spans representing the cause and the effect. Such papers usually focused on the automatic method of entity and relation extraction from text. A few studies have represented more attributes of the cause-effect relation such as the polarity of the effect (i.e., whether it increases or decreases the effect), the effect size (i.e., the magnitude of the effect), and the “epistemic” or belief status of the relation (e.g., whether the relation is hypothesized or observed). Some studies have represented factors or variables that represent moderator variables or conditions for the cause-effect relation to hold. Jiang et al (2019) urged for the representation of the Condition (a kind of qualifier) to indicate under what condition the research claim holds. Such information is more difficult to extract as they are often mentioned in a subordinate clause of a claim sentence.

Magnusson and Friedman (2021) adopted a fairly detailed cause-effect representation including the causal relation attributes of likelihood, strength and direction, relation qualifiers (i.e., the conditions under which the causal relation applies), supporting evidence (e.g., a study, theory or method), and the epistemic status of the causal relation (i.e., whether hypothesized, assumed or observed). In addition to representing cause-effect relations, they identified comparative, predictive, statistical and association relations in the scientific claim sentences. Their cause-effect schema design is quite similar to ours. After manually annotating 901 sentences, they used them to train a transformer model to extract entities and relations from a test sample of 100 sentences. While they adopted a fairly detailed cause-effect representation and used a state-of-the-art information extraction method based on a transformer model, their experiments made use of a curated dataset of sentences. Applying the knowledge graph schema and construction method to the fulltext of research papers will raise additional issues.

Some studies (e.g., Clark et al, 2014; Wang et al, 2020) adopted different perspectives of scientific research with corresponding implications for their knowledge graph schema design and semantics. Vogt et al (2020) focused on the contributions of the research, though this necessarily included the research results. They proposed a representation framework called Research Contribution Model that included the classes objective, method, activity, agent, material, and result from which the contribution is derived. They illustrated the knowledge graph design and construction based on abstracts from the domains of medicine, computer science and agriculture. The focus was on the overall argument structure of the research study rather than on a detailed conceptual representation of the research results or contribution. Though they illustrated a detailed representation of quantitative research results, they did not show how the conceptual structure of causal research results may be represented.

Chen et al (2023) viewed scientific research through the lens of problem-solving. They constructed problem-solving knowledge graphs that represented four types of relationships: problem-solving, problem hierarchy, solution hierarchy and association relation. This representation is more coarse-grained than our representation as they did not represent the conceptual structure of the problem-solving statements.

Though the majority of studies have focused on extracting cause-effect relations from research abstracts, a few studies have sought to identify cause-effect relations from the fulltext of the research paper. Inevitably, this raises the question of which parts of the paper to focus on, as well as how the discourse structure and argument structure are represented in the knowledge graph. Our main goal is to construct causal argument knowledge graphs focusing on how the research results (or scientific claims) are supported with other relationships, with evidence as well as with arguments. Another issue is how to represent the way causal claims evolve or are gradually constructed from the beginning of the paper to the end.

De Waard et al (2009) sought to construct an argumentation graph for scientific papers representing the relationships between hypotheses, claims and research evidence, including the epistemic status of the statements (i.e., known fact, experiment result or hypothesis) and highlighting the rhetorical and argument flow. They referred to this as a Hypotheses, Evidence and Relationships (HypER) approach. They pointed out that this “shift to author intent means shifting our conceptualization of the text towards discourse: that is, a move from viewing the text as a collection of verbs and nouns, to a view of the contextualized pragmatic language used for science” (de Waard et al, 2009, p. 2).

Pertsas and Constantopoulos (2017) proposed a Scholarly Ontology that modeled research activities as a kind of business process. The ontology focuses on events and activities that scholars engage in, and supports three perspectives: agent and intentionality, procedure and intellectual framework, and the resources used and produced. The ontology is broad enough in scope to import more specific ontologies covering specific aspects of research.

Recently, Song et al (2022) constructed an argumentation graph to link relations and supporting arguments within a paper (intra-article relations) and across papers (inter-article relations). Their case study involved 12 papers on the topic of Technology Acceptance Model (TAM). They sought to represent the evolution of knowledge within and across papers. They used 12 intra-article argumentation relationships from their Scientific Paper Argument Ontology (Zhou et al, 2019), and 11 inter-article argumentation relations some of which were imported from the Citation Typing Ontology (Shotton, 2010). Their intra-article argumentation classes and relations are a combination of argumentative functions, rhetorical functions and information types. They adopted categories from Toulmin’s argument framework (i.e., claim, evidence, warrant, backing, and rebuttal); the rhetorical categories of background, research problem, hypothesis, and conclusion; and information types comprising theoretical evidence (with subclasses work, principle, and formula) and the factual evidence (with subclasses experience, example, data, experiment, and fact). They did not model the detailed conceptual structure of the statements, such as the cause-effect relation claims.

In a parallel study, Wang et al (2022) used largely the same framework to analyze the argument structure of 20 papers each from the disciplines of library and information science (LIS) and biomedical research. They found that in the biomedical research papers, factual evidence in the form of data, method and experiment are more important than theoretical evidence in the form of formulae and cited works which were more prominent in LIS papers. The LIS papers tended to use theories and causal relationships to derive hypotheses for experimental confirmation, whereas biomedical papers focused on research problems and design of experiments to address them.

These frameworks for modeling the argument and rhetorical structure of research papers and the research activities described in them are well-founded. However, there have been no user studies to confirm their usefulness. We argue that these models of argument, rhetorical and research activity structure of research papers need to be extended to include the detailed conceptual (or information) structure of the statements that are usually represented as nodes in argument knowledge graphs. Our earlier study (Cheng and Khoo, 2021) showed how comparison structures link cause and effect subclasses, attributes and aspects to provide support for the cause-effect relationships. Representing the interaction between argument, rhetorical, activity and conceptual structures of research papers will make the scientific knowledge graph very complex, but it will allow intelligent applications to hypothesize relationships (i.e., knowledge graph completion), identify conflicting research claims and synthesize summaries for different purposes. Simplified sub-graphs highlighting these inferences and their text summaries can be presented in a user-friendly form. Our current study contributes to this goal by analyzing how the conceptual details of cause-effect relations change and evolve along the argument/rhetorical flow of general statement, topic centrality, literature review summary, research gap, research objective, research result and research contribution.

3. Analysis Framework

Our analysis was focused on seven types of statements, listed in Table 1. These categories have been used as rhetorical functions in discourse analysis frameworks following Swales’ (1990) CARS (Creating A Research Space) model. We adopted these terms and expanded them to construct our typology of information/argument element types—to carry out information/argument structure analyses (Cheng and Khoo, 2022). Several authors in the field of argumentation have pointed out the interrelatedness of argumentation and rhetoric. Hinton (2019, p. 96) noted “it is clear that any investigation into how language is used to put across arguments cannot remain aloof from considerations of rhetorical impact”.

Table 1. Argument/rhetorical elements analyzed in this study.
Argument/rhetorical elements Definition
Topic centrality Indicates the importance of the research topic or an aspect of it. It has two subtypes: centralizing the topic, and/or indicating the reason for centrality.
General statement Indicates a broad or sweeping claim, often a generalization of previous research results, statement of established or general knowledge, or statement of a practical problem or research issue. It is sometimes referred to as topic generalization.
Literature review summary Summarizes or generalize information (e.g., results and objectives) from the literatures.
Research gap Indicates a research gap indicating that a specific research issue has not been adequately investigated and deserves further study.
Research objective Indicates a research objective of the current study, implying that it is well-founded and well-worth studying.
Research method summary Indicates a research method, implying that it is appropriate for addressing a research objective.
Research result summary Indicates a research result of the current study.
Research contribution Indicates a research contribution of the current study.

We selected these seven argument/rhetorical elements because our earlier analysis of the Introduction sections of sociology research papers (Cheng and Khoo, 2022) found the following typical sequence of argument/rhetorical elements:

(General statement or Topic centrality) …

Literature review summary …

(Research issue or Research gap or Research question) … Research objective … Research contribution/recommendation

A corresponding analysis of the Abstracts of the same set of papers found a similar sequence of argument/rhetorical elements with the addition of the research result element. These argument/rhetorical elements can be considered to be linked together into a coherent argument chain, forming the main “spine” of the paper. As the Abstract and Introduction sections can be considered to present an overview of the arguments in a research paper, we propose that these statements can help to flesh out the causal knowledge summarized in the research result statement.

Our analysis and modeling of the causal information structure is based on our cause-effect information frame, developed in a previous study (Cheng and Khoo, 2022). This frame, outlined in Fig. 1, specifies all the possible types of causal roles that we can identify.

Fig. 1.

The Cause-Effect information frame.

In analyzing how causal knowledge is constructed when the paper steps through the seven types of argument/rhetorical elements (statements), we identified the following types of changes in the causal information structure of the statements:

• Type 1: Specifying the cause and/or effect concept, or narrowing or broadening them.

• Type 2: Generalizing/specializing the cause-effect relation (e.g., from a cause-effect relation to an association or correlation relation, from cause-effect relation to association relation, and a reverse direction).

• Type 3: Indicating moderator or mediator factors.

In addition, we examined whether the general statement and topic centrality statement focuses on the cause concept, effect concept, the causal relation between them, or some other role in the cause-effect frame. We also compared the research result statements with the associated research objective statements to find out whether the research objective statements contain any extra information not found in the research result statements.

A walk-through using statements taken from Ferraro et al (2016) (“Childhood disadvantage and health problems in middle and later life”) will illustrate our analysis. Consider the following two statements: literature review summary leading to a research gap:

Literature review

Text: Considerable research demonstrates important links between early experiences and adult health problems, …

Research gap

Text: … but questions remain as to how and under what conditions early experiences threaten health in later life

The basic causal information structure underlying these two statements can be characterized as:

(childhood experiences) –[cause]-> (adult situation).

Comparing the two statements, we observe the following changes to the causal structure:

• Specialization of the research relation: the literature review statement specifies a generic association relation (i.e., links), whereas the research gap statement specifies a subclass of the cause-effect relation threaten.

• Addition of roles to the causal structure: the research gap statement includes the additional roles of conditions, moderator and mediator variables to the causal structure.

Now compare the research objective statement with the research result summary statement:

Research objective

Text: … we use longitudinal data from a national sample to examine links between multiple forms of early life disadvantage and multiple health problems in adulthood.

Research result

Text: The analysis reveals that all three domains of childhood disadvantage [low socioeconomic status (SES), family composition, and child abuse], are consequential to adult risks and resources. Frequent child abuse was related to W1 health problems [associated with lifetime smoking and heavy drinking], but this relationship was mediated by lifestyle risks and social psychological resources, namely lifetime smoking, family support, family strain, and personal control.

The research objective statement did not make changes to the causal structure except to promise to identify subclasses of the cause concept (i.e., types of childhood disadvantage) and the effect concept (i.e., types of adult health problems). The research result statement makes good on the promise: it identifies types of childhood disadvantage (i.e., low socioeconomic status, family composition, and child abuse) and types of adult health problems (i.e., health problems associated with lifetime smoking and heavy drinking). In addition, it identifies the mediator variables of lifestyle risks and social psychological resources (with the subclasses of lifetime smoking, family support, family strain, and personal control).

We implemented a graph visualization application to present the causal information structure of the statements along the argument chain, showing how the causal knowledge is constructed (the application is accessible at : https://kgraph.sg/argstructure/demo1.html and https://kgraph.sg/argstructure/demo2.html). Fig. 2 presents a visualization of the causal structure in the research objective and research result statements, showing how the Effect concepts in the two statements are related. Fig. 3 shows a merged visualization that combines the causal structure of both research objective and research result statements, but indicating the contributions of each statement.

Fig. 2.

Visualization of the causal information structure for research objective and research result statements, showing how the Effect concepts in the two statements are related.

Fig. 3.

Merged causal structure of research objective and research result statements, showing the contributions of each.

4. Methods
Corpus

The 30 sociology research articles that we analyzed were sampled from a corpus of articles taken from the top ten sociology journals listed in InCites Journal Citation Reports (see Table 2), which was used for the first author’s PhD thesis research. All the articles were published in the late 2015 or early 2016 volumes of the journals. Only articles that report studies involving data analysis were included; articles that report literature surveys or philosophical/theoretical discussions were excluded from the corpus. The articles sampled from Qualitative Research journal did not include papers that met our criterion of investigating cause-effect relationships, and so no paper from this journal was included in the current analysis.

Table 2. List of sociology journls, and the number of research papers taken from each.
Code Journal title Sample
S01 American Journal of Sociology 3
S02 Annals of Tourism Research 2
S03 Cornell Hospitality Quarterly 4
S04 European Sociological Review 3
S05 Gender Society 4
S06 Information Communication Society 3
S07 Journal of Marriage and Family 4
S08 Social Networks 4
S09 Qualitative research 0
S10 American sociological review 3
Total 30

The papers that we analyzed focused on just one or two main cause-effect relationships. Thus, the causal relations of interest are mentioned many times with different amounts of information and different degrees of completeness. The focus is on explicit, empirically-supported causal relations, which are easier to identify and analyze in studies involving data analysis.

For qualitative research papers, the causal relation is often not the focus of the research but play supporting or explanatory roles. Thus, there may be many “small” causal relations which are not investigated extensively. The expressions of causality also tend to be implicit, nuanced and embedded within narrative or descriptive accounts. Causal relations may also manifest as emergent themes or interpretations rather than direct, testable hypotheses. Thus, the causal relations in qualitative research papers have to be processed and displayed differently.

Similarly, theoretical/philosophical and literature review articles, though rich in conceptual connections, were excluded because their causal arguments are frequently more abstract, high-level, or synthesized from multiple sources rather than presenting novel empirical findings of cause-effect. For example, theoretical/philosophical papers may propose causal arguments, but these are often hypothetical or conceptual rather than empirically tested results. Literature reviews, as they summarize existing knowledge, certainly contain various causal links from past studies. However, the challenge lies in attributing these links correctly to their original sources and distinguishing them from the review’s own interpretative claims.

The frequencies of the argument/rhetorical types (i.e., number of papers having the type of statement) are listed in Table 3.

Table 3. Number of papers having the following argument/rhetorical elements.
Argument claim type Frequency (%)
General statement 20 (of 30 papers, 67%)
Topic centrality 17 (57%)
Literature review summary 27 (90%)
Research gap 26 (87%)
Research method 30 (100%)
Research objective 29 (97%)
Research result 30 (100%)
5. Results

We present the analysis results organized into the three types of information structure changes described earlier, as well as highlight the contributions of the general statement, topic centrality and research objective statements. To clarify the results, we make use of the main argument chain in the research paper Ferraro et al (2016) as the illustrative example for the graph visualizations presented in the figures. Table 4 lists the main argument/rhetorical statements from this paper. Fig. 4 offers a visualization of the causal information structure for the first four argument/rhetorical statements.

Table 4. The main argument/rhetorical statements from Ferraro et al (2016).
Argument claim Statement from the paper (Text)
General statement [from Introduction section] It is understandable that cause-conceptproblems early in life, from subclass-ofcauselow birth weight to subclass-of-causeeconomic deprivation, may cause-effect-relationinfluence effect-conceptstatus attainment and effect-conceptmental health, …
Topic centrality None
Literature summarize/generalize [from Introduction section] Considerable research demonstrates important association-relationlinks between cause-conceptearly experiences and effect-conceptadult health problems, …
Research gap [from Introduction section] but questions remain as to how and under what conditions cause-conceptearly experiences cause-effect-relationthreaten effect-concepthealth in later life …
Research objective [from Introduction section] First, we draw on applied_theory/model/frameworkrecent theoretical developments in sociology and epidemiology to offer a conceptually integrated argument about the cause-effect-relationearly origins of effect-concepthealth problems observed decades later. Second, and distinct from most prior studies, we use evidence.data_sourcelongitudinal data from a national sample to examine association-relationlinks between cause-conceptmultiple forms of early life disadvantage and effect-conceptmultiple health problems in adulthood.
Research method [from Method section and Measurement section] Data for this study come from evidence.data_sourcethe National Survey of Midlife Development in the United States (MIDUS), a sample of target.populationadults age 25 to 74 years. … Given the skewed distributions for these variables, we used evidence.methoda negative binomial regression model for analyses (Long, 1997).
Research result [from Discussion section] The analysis reveals that all cause-conceptthree domains of childhood disadvantage [subclasses: low socioeconomic status (SES), family composition, and abuse], cause-effect-relationare consequential to mediatoradult risks and resources.
Frequent cause-conceptchild abuse association-relationwas related to effect-conceptW1 health problems [Wave 1 problems: lifetime smoking, heavy drinking], but this relationship was mediated by mediatorlifestyle risks and social psychological resources, namely lifetime smoking, family support, family strain, and personal control.

Notes: Italics represent the element of the underlined words.

Fig. 4.

Visualization of the causal information structure for General statement, Literature summarize, Research gap and Research objective statements.

The general statement claim lays out the main causal information structure:

It is understandable that problems early in life [i.e., childhood situation] … may influence status attainment and mental health [adult situation], …

The causal information structure represented in this statement can be characterized as:

(childhood situation) –[cause]-> (adult situation)

This general statement suggests that the causal relation is general knowledge, or at least plausible. It also starts sketching out a class hierarchy for the cause concept and the effect concept. The rest of the argument claims in the argument chain flesh out this causal information structure in various ways.

5.1 Role of General Statement and Topic Centrality

The general statement and topic centrality statements often appear as the first sentence of the Introduction section, or at least in the first paragraph. General statement often indicates general knowledge or established knowledge (usually with a citation). Topic centrality points out the importance of the topic. These two argument/rhetorical elements overlap: often a statement combines both intentions. The general statement and topic centrality may highlight the cause concept or effect concept, or the relation between them. These statements help to highlight important concepts in the causal information structure, and can help the reader (as well as argument mining programs) to construct the class hierarchies for the cause and effect concepts.

Some statements highlight the relations that plausibly hold between the concepts, and help the reader (and argument mining programs) to identify the main causal information structure. This example general statement suggests that the causal relation is plausible:

[General statement] It is understandable that cause-conceptproblems early in life, from subclass-of-causelow birth weight to subclass-of-causeeconomic deprivation, may cause-effect-relationinfluence effect-conceptstatus attainment and effect-conceptmental health, …

These statements can also link the cause and effect concepts to other concepts in the literature—that is, linking to concepts in other research papers, potentially supporting the construction of a taxonomy for the research area:

[Topic centrality] cause-conceptAssimilation and discrimination have long been at the center of discussion about context.target_populationimmigration context.locationin the United States (Gordon 1961…). A large body of literature on name-giving emphasizes the sociological importance of subclass-of-causenames.

Finally, the topic centrality statement may extrapolate the cause-effect relation to a distal consequence to highlight the importance of the current study:

[Topic centrality] … cause-conceptshared timeis important for effect-concept marital well-being (e.g., Daly 2001; …), and that the quality of marital relationships is associated with the quality of parent-child relationships (e.g., Jekielek 1998…).

This can also help to link the research results reported in the current paper, to results in other papers and support the construction of an ontology for the research area.

We analyzed the general statement and topic centrality statements to identify whether they focused on the cause concept, effect concept or the relation between cause and effect. The intercoder agreement for this coding (between the two authors) was 0.97 using the Jaccard similarity measure. Table 5 gives the frequency counts of the cases where both authors agreed.

Table 5. Concepts or relations highlighted by General statement and Topic centrality.
Claim type Highlights
Cause concept Effect concept Relation
General statement (N = 20, out of 30 papers) 9 (45%) 5 (25%) 3 (15%)
Topic centrality (N = 17) 8 (47%) 3 (18%) 1 (6%)

Both general statement and topic centrality statements highlighted the cause concept about half the time, as illustrated in these examples:

[General statement] cause-conceptDesegregation is now largely a policy of the past, and the school segregation of African-Americans has increased in districts that have been declared “unitary” (Clotfelter, Vigdor and Ladd 2006…).

[Topic centrality] Research on the cause-conceptco-creation of an experience has recently been receiving a significant amount of attention in marketing and tourism research (e.g., Shaw et al, 2011…).

A focus on the effect concept occurred much less often. Here are two examples:

[General statement] effect−conceptAttitudes towards the roles of men and women in society may be more or less traditional, and social norms within a country may be more or less restrictive in terms of which behaviours are and not socially acceptable for men or women.

[Topic centrality] This need is especially critical, given the large number of effect−conceptnon-travelers worldwide (Smith, Fralinger & Litvin, 2011).

A focus on the nature of the relation does not have high frequency, because focusing on the causal factor already implies a causal relation. In most cases, it can be assumed that the study sought to identify cause-effect relations, and rather than seeking to determine whether it is an association, correlation, prediction or cause-effect relation.

5.2 Type 1. Specifying the Cause or Effect Concept, or Narrowing or Broadening Them

This type of modification to the causal information structure includes:

• Adding a new cause or effect concept—which may be an alternative cause or effect to what is generally known or investigated in previous studies;

• Narrowing the cause or effect concept—by specifying a subclass of the concept, or adding an attribute or aspect of the concept, thus making the cause/effect more specific;

• Broadening the cause or effect concept—by moving up the class hierarchy to a more abstract concept (possibly suggesting that the study will broaden the scope of the study to additional sibling concepts in the class hierarchy).

Thus, this analysis sought to identify cases where the author zooms in to narrower concepts, and zooms out to broader or more abstract concepts (see Table 6). This coding obtained an intercoder score of 0.65 (Jaccard similarity measure). This indicates that for each coder, about 20% of the tags were not found in the other coder’s tagging, indicating that the coding involves some subjective judgement.

Table 6. Specializing or generalizing the cause concept and effect concept in argument steps (N = 30).
Argument step Cause concept Effect concept
Add Narrower Broader Add Narrower Broader
General statement, Topic centrality or Literature summarize/generalize Research gap 5 1 0 2 4 0
Research gap Research objective 3 3 0 0 3 0
Research objective & Research method Research result 5 0 1 1 5 0
Research result & Research gap Research contribution 0 0 3 1 0 4

Table 6 indicates that research gap statements introduced additional cause concepts 5 times (out of 30 papers), and specified narrower effect concepts 4 times—compared to the concepts in the preceding general statement and literature review statements. Similarly, research result statements added new cause concepts 5 times (compared to the research objective statement), and specified narrower effect concepts 5 times. The research contribution statement broadened the effect concept 4 times (compared to the research result). Admittedly, there are no compelling patterns that occurred more than 10 times.

The conceptual zooming in and zooming out along the main argument chain can assist argument mining programs to construct class hierarchies. In the example argument chain in Table 4, both the cause concept (childhood situation) and effect concept (adult situation) are elaborated with subclasses to form class hierarchies. The main class hierarchies for the cause concept and effect concept are listed in Table 7, and the complete concept network are displayed in the graph visualization in Fig. 5.

Table 7. Class hierarchies for the cause concept and effect concept for the example argument chain.
childhood situation (synonyms: early in life, early experiences)
childhood problems (problems early in life, childhood disadvantage, early life disadvantage, early stressor, distal or “upstream” risk factors)
childhood health problems
low birth weight
childhood mental health issues
traumatic experience
childhood economic deprivation
low socioeconomic status
child abuse
family composition
adult situation
adult status attainment
adult health situation (health in later life)
adult health problems
adult mental health problems
Fig. 5.

Multiple interconnected class hierarchies for the cause concept and effect concept for the example argument chain.

The argument chain navigates down and up the class hierarchies. First, the general statement sketches out the class hierarchies:

• Cause concept: Childhood situation > Childhood problems > Low birth weight and Economic deprivation

• Effect concept: Adult situation > Status attainment and Mental health

The literature review, research gap and research objective statements specify broader concepts: early experiences/early life disadvantage cause adult health problems. Then the research result statement zooms in to subclasses of the cause concept and effect concept: low Socioeconomic Status (SES), family composition, and child abuse cause health problems associated with lifetime smoking, and heavy drinking. The research contribution statement then zooms out into broad concepts: distal or “upstream” risk factors causes social factors and health.

This conceptual zooming in and out along the argument chain occurs in some papers. Some academic writing authors have referred to this as the funnel structure in the Introduction section (e.g., Annesley, 2010; Bahadoran et al, 2018; Plaxco, 2010). However, we observed that a reverse funnel sometimes occurs in the Discussion/Conclusion section—forming what can be termed an hour-glass pattern (see Fig. 6). We found the hour-glass pattern to occur in 8 papers (27%, 8 of 30 papers) for the cause concept, and 7 papers (23%) for the effect concept. Examples of the hour-glass pattern of conceptual zooming in and zooming out across the argument chain are provided in Appendix Tables 10 and 11.

Fig. 6.

The hour-glass pattern of conceptual zooming in and zooming out along the main argument chain.

One way of making a cause concept or effect concept more specific is to elaborate it with additional attributes or aspects, thus narrowing the scope of the causal information structure, as illustrated in the following examples:

[Research gap] Surprisingly, the debate on the pros and cons of cause-conceptshared residence disregards the consequences for parents, …

[Research objective] We distinguish between two aspects of cause-conceptresidence arrangements: aspect-of-causethe main residence of the child (i.e., subclass-of-cause-conceptmother, father, or shared residence) and aspect-of-causethe visitation of nonresident parents.

[Research gap] Despite frequent contact and the potential cause-effect-relationimpact of cause-conceptgrown children on effect-conceptparental well-being, however few studies have examined subclass-of-cause-conceptdaily experiences in this tie.

[Research objective] In this study we examined the aspect-of-causemodalities parents use to be cause-conceptin touch with grown children subclass-of-causeon a daily basis (e.g., phone text, in person). … We also examined the aspect-of-effectemotional valence of daily experiences (i.e., pleasant, stressful).

Sometimes, a study investigated alternative causal factors:

[Objective] We assessed association-relationthe nature and direction of the relationship between aspect-of-causeWFC [work-family conflict] and aspect-of-effectpsychological distress for cause-conceptmothers

[Result] A range of alternative_causematernal, child family, and work covariates (with the exception of family size) were found to association-relationbe associated with cause-concept WFC or effect-conceptpsychological distress context.temporalat each point in the family life cycle.

[Literature-summarize/generalize] Prior research that examines effect-conceptbiracials’ labeling choices cause-effect-relationemphasizes the importance of cause-conceptfamily, peers and environmental context, …

[Gap]… but gives little attention to the cause-effect-relationinfluence of cause-conceptnonracial social identities

5.3 Type 2. Elaborating on the Cause-effect Relation

Initially, we attempted to code whether there was a narrowing or broadening of the cause-effect relation. An example of broadening would be to indicate a cause-effect relation in the research objective statement followed by an association or correlation in the research result (especially if the result is based on statistical analysis). However, the intercoder agreement for this coding was poor. In most papers, it is clear that the authors intended a cause-effect relation even when they use the words association or link. In the example in Table 4, the paper authors occasionally used the more generic link and associated with expressions in the literature summarize/generalize, possibly for the sake of linguistic variety.

Instead, we analyzed whether the papers specified additional relations, or reversed the direction of the relation, or specified the polarity of the relation (i.e., positive or negative), or specified the effect size (see Table 8). The intercoder agreement for this coding was 0.8.

Table 8. Specializing or generalizing the research relation in argument steps.
Argument step Relation
Additional relation or reverse direction Polarity (Positive or Negative) Effect size (measurement)
General statement, Topic centrality or Literature summarize/generalize Research gap 2 0 0
Research gap Research objective 1 0 0
Research objective & Research method Research result 5 10 2
Research result Research contribution 0 0 0

Ten (33%) of the papers specified the polarity of the cause-effect relation in the Research result statement, for example:

[Result] The results can be summarized as follows. First, CSR-brand fit polarity.positiveenhances consumers’ personal and social identification with brands.

[Result] We find that cause-concepteconomic growth has a polarity.positivenegative effect on subclass-of-effecthomicide rates, whereas the aspect-of-cause-conceptdivorce rate and aspect-of-cause-conceptincome inequality have polarity.positivepositive effects on subclass-of-effecthomicide rates.

The authors may add additional relations from one argument/rhetorical element to the next. Here is an unusual case of the study finding a bi-directional cause-effect relation:

[Objective] We assessed association-relationthe nature and direction of the relationship between aspect-of-cause-conceptWFC [work-family conflict] and aspect-of-effect-conceptpsychological distress for cause-conceptmothers

[Result] Our findings suggest that there are ongoing cause-effect-relationmutual influences between cause-conceptWFC and effect-conceptpsychological distress whereby polaritydeterioration in one can cause-effect-relationlead to ongoing polaritydetriment in the other.

In this example, the author added a co-occurrence relation to the causal information structure:

[Result] An additional contributor to spillover as it is tested in this study could be timing: cause-conceptMarital and effect-conceptparent–child conflict may sometimes co-occurrenceco-occur rather than occurring in sequence.

5.4 Type 3. Indicating Moderator or Mediator Factors

Table 9 indicates that research objective and research result statements sometimes add moderator factors, and to a lesser extent mediator factors.

Table 9. Adding moderator and mediator factors.
Argument step Moderator Mediator
General statement/Topic centrality/Literature summarize/generalize Research gap 0 1
Research gap Research objective 7 1
Research objective & Research method Research result 5 4
Research result & Research gap Research contribution 2 1

The example argument chain in Table 4 has a complex example of indicating mediator factors. First, the research gap statement identifies “how and under what conditions” as the research gap. The research objective statement implicitly refers to this when it states, “we … offer a conceptually integrated argument about the early origins of health problems”. The research result then identifies the mediating factors: “this relationship was mediated by lifestyle risks and social psychological resources, namely lifetime smoking, family support, family strain, and personal control.”

A moderator factor is sometimes specified in the research result statement, even when it is not mentioned in the research objective:

[Result] ComparisonThe one exception to this pattern is subclass-of-causeRussian, predominantly Jewish, common_conceptimmigrants: for this group, sons with ethnic names ended up, on average, difference-in-the-effect-conceptsin higher-earning occupations… [i.e., ethnic group as a moderator variable].

Here is an example of indicating a mediator factor:

[Result] Frequent cause-conceptchild abuse association-relationwas related to effect-conceptW1 health problems[Wave 1 problems: lifetime smoking, heavy drinking], but this relationship was mediated by mediatorlifestyle risks and social psychological resources, namely lifetime smoking, family support, family strain, and personal control.

Occasionally, the author may extrapolate to a future or more distal consequence, in effect treating the effect concept as a mediator factor towards a distal effect:

[Objective] In this study we weaved together these four ideas to examine whether cause-conceptmothers’ education is polarity.positivepositively association-relationassociated with effect-conceptinvestments in children’s health during context.temporalearly childhood

[Result] …we view these two sets of results (comparing changing and unchanging health needs) as new evidence for how cause-conceptmaternal education association-relationis linked to children’s aspect-of-effect-conceptshort- and long-term health.

This may be a means of indicating the importance of the research objective or research result.

5.5 Research Result Versus Research Objective Information

It is not surprising that the research result statement provides more detailed causal factors and more specific effects and relations than the research objective statement. The question arises whether the research objective statement provides any additional information not found in the research result. In other words, can the research objective statement be discarded once we know the research result?

Fig. 7 visualizes the additional concepts and roles (colored green) that the research objective statement adds to the research result for the paper by Ferraro et al (2016) (the main illustrative example). Notably, the research objective indicates the theoretical basis for the study and the type of data used.

Fig. 7.

Visualization showing what additional concepts and roles (colored green) the research objective statement adds to the research result statement.

Comparing the research result with the research objective can highlight unexpected results not indicated in the research objective, as illustrated in the following two pairs of research objective-results statements:

[Objective] In the present study we built on this daily diary literature by examining cause-effectbidirectional spillover from concept1the marital to concept2the parent – child dyad and cause-effectvice versa.

[Result] An additional contributor to spillover as it is tested in this study could be timing: concept1Marital and concept2parent–child conflict may sometimes co-occurrenceco-occur rather than occurring in sequence.

[Objective] This study expands prior research by…testing the claim that measurement:target_entityleisure with children and family is more likely to be experienced as difference1leisure for concept1men and difference2work for concept1women (Shaw 2008).

[Result] However, and comparisoncontrary to expectations, the concept2mothers in this study modalitydid not seem to benefit comparisonless from difference2leisure than did concept1fathers.

It can highlight negative and null results:

[Objective] purposeTo test the classical conjecture above, I explore the extent to which concept_set/concept1absolute mobility, in conjunction with economic development and income inequality, cause−effectinfluencesconcept2class identification.

[Result] This study also found that aspect1national-level absolute mobility modalitydoes not cause−effectinfluence concept2class identification.

It can also highlight exceptions or opposite results for a subset of the population or under certain circumstances:

[Objective] …we analyze subclass1the choice of first names to investigate the cause-effectcauses and consequences of concept1immigrant assimilation at the individual level … We ask whether common_concept[comparison]immigrants who attribute attribute1assimilated were difference1better able to climb the economic ladder than those who attribute2retained their ethnic identity.

[Result] ComparisonThe one exception to this pattern is subclass1Russian, predominantly Jewish, common_conceptimmigrants: for this group, sons with moderatorethnic names ended up, on average, difference1in higher-earning occupations

6. Conclusion

We have analyzed how causal knowledge is constructed along the main argument chain in a sample of 30 sociology research papers. The main argument chain comprises a sequence of statements having the following argument and rhetorical functions: general statement, topic centrality, literature review summary, research gap, research objective, research method summary, research result summary and research contribution. In previous work, we have identified these as forming the main argument chain in the Introduction sections of sociology research papers (Cheng and Khoo, 2022).

The general statement and topic centrality statements highlight important concepts in the causal information structure, especially the causal factors that are the focus of the studies. A general statement often points out cause-effect relations that plausibly hold between two concepts being investigated in the study. The statements sometimes link the cause and effect concepts to other concepts in the literature, which can be used to construct class hierarchies for the cause and effect concepts. The topic centrality statement may extrapolate the cause-effect relation to a distal consequence to highlight the importance of the current study.

Conceptual zooming in and out occurs along the argument chain. The research result statements provide more detailed and specific causal factors and effects as well as moderator and mediator variables, while the research contribution statement sometimes provide broader concepts. We found the hour-glass pattern (zooming into more specific concepts and then zooming out to broad concepts) occurring in about a quarter of the papers, for both the cause concepts and effect concepts. Comparing the research result with the research objective can highlight unexpected results not indicated in the research objective, highlight negative and null results, as well as exceptions or opposite results under certain circumstances. Research result statements occasionally add new causal factors, specify narrower effect concepts, add moderator factors, and to a lesser extent mediator factors. The research contribution statements sometimes broaden the effect concept, and also sometimes compare the results with those of previous studies. They may also extrapolate the results to a possible distant consequence.

We have implemented a prototype knowledge graph visualization interface that visualizes the causal information structure of each statement along the argument chain, as well as how the concepts in the causal structure are linked across the argument elements as well as to class hierarchies, thus showing how information changes along the argument chain. This visualization interface was implemented as a Web application, with a graph database management system (Neo4j graph database on AuraDB cloud service) serving as the backend database, and a graph visualization JavaScript library (Cytoscape.js) on a Web page serving as a frontend. The system is designed with a view to supporting summarization of research results across social science research papers, identification of research gaps, and inferring potential new causal relations which can serve as hypotheses for investigation.

To fulfil this vision, accurate information extraction from the fulltext of research papers covering the details of the cause-effect information frame is needed, together with argument mining methods to identify the argumentative and rhetorical functions of the sentences. Though the state-of-the-art in automatic extraction of cause-effect information is in supervised learning using deep-learning models such as BiLSTM, Transformer and BERT models (Ali et al, 2021), recent studies using ChatGPT (a Large Language Model) have found that with well-designed prompts, LLM models can yield information extraction accuracies that are close to deep-learning models with much less effort (e.g., Zhu et al, 2024; Chan et al, 2024). Thus, it is now feasible to develop Web applications for constructing causal argument knowledge graphs for science and social research papers automatically.

In addition to extracting the details of cause-effect relationships from text and tagging sentences with argument/rhetorical functions, our analysis has highlighted that additional metadata should be provided to indicate meta-level discourse elements such as: which element of the cause-effect relation is the focus of the study, whether a cause-effect relation is proposed or argued as being plausible, whether a cause-effect relation is extrapolated to distal consequences, conceptual zooming in and out, unexpected results, and exceptions to a research result under particular conditions.

Although the scope of our analysis is limited to sociology research papers, the results are expected to apply more generally to social science research papers. This is because our analysis framework is based on argument/rhetorical functions (e.g., research gap and research objective) that are common to many social science disciplines. We acknowledge that while the general argument/rhetorical structure is shared, the specific forms of causal claims and the level of quantitative rigor vary across disciplines. For example, economics and psychology are likely to have more complex mathematical and statistical expressions. Our analysis has highlighted some of the conceptual dynamics of knowledge construction, which is a meta-level phenomenon that is threaded through the argument/rhetorical structure of a research paper.

Availability of Data and Materials

The primary dataset for this study consists of sociology articles published in SSCI-indexed journals. Due to copyright restrictions and subscription-based access through university library databases, the full texts of these articles cannot be publicly shared by the authors.

Author Contributions

WNC: Writing (Original Draft, Review & Editing), Conceptualization, Methodology, Formal Analysis, Visualization, Supervision. CSGK: Writing (Review & Editing), Methodology, Formal Analysis, Visualization. Both authors contributed to editorial changes in the manuscript. Both authors read and approved the final manuscript. Both authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Acknowledgment

Not applicable.

Funding

This study was supported by the National Science and Technology Council, Taiwan (Grant No. 113-2410-H-003-187-MY2).

Conflict of Interest

The authors declare no conflict of interest.

Appendix

See Tables 10, 11.

Table 10. Example of the hour-glass pattern of conceptual zooming in and zooming out across the argument chain in Prickett and Augustine (2016).
Example from Prickett and Augustine (2016).
Literature summarize/generalize:
A robust literature highlights how maternal education differences in parenting practices when children are young are key predictors of inequalities in children’s academic development, which persist across the early life course and eventually translate to disparities in other indicators of adult well-being (e.g., wages labor force participation; Haveman & Wolf, 1995; …).
Research gap:
Yet within this framework the role of parenting in explaining maternal education differences in the development of children’s health has often been overlooked.
Research objective:
Such research laid the foundation for the current study, in which we examined maternal education differences in mothers’ health investment behaviors across early childhood (birth through age 5) and, going a step further, whether these differences were greatest during the time when children’s needs are the most demanding, sensitive to parental inputs, or foundational to long-term health.
Research result:
… we found that mothers with a college degree were most likely to practice more advantageous health investment behaviors in terms of preventative care, nutrition SHS exposure, car seat use, physical activity, and television watching throughout early childhood.
General result:
… we view these two sets of results (comparing changing and unchanging health needs) as new evidence for how maternal education is linked to children’s short- and long-term health above and beyond correlated sources of income or demographic factors, which we controlled for.
Research contribution:
These empirical findings, taken more broadly, shed new light on how mothers’ education is connected to the diverging destinies of children today.
Table 11. Example of the hour-glass pattern of conceptual zooming in and zooming out across the argument chain in Curtis (2016).
Example from Curtis (2016).
Literature summarize/generalize:
A long-standing argument holds that social mobility weakens class awareness (Marx, 1894; …). It is argued that mobility increases contact between social classes, which results in the depoliticization of economic issues, and reduces the potential for groups to achieve a ‘class for itself’ (Heath, 1981).
Research gap:
Nevertheless, although much early work on mobility alluded to its consequences for class identity, little empirical research has explored this relationship. … Aside from a few recent studies (Evans and Kelley, 2004; …), little research has been done pertaining to the influence of national context on class identification.
Research objective:
This research article begins to fill this gap by exploring the social and political implications of class identification and awareness across 33 modern societies. … this article explores how individual-level mobility and national-level economic conditions shape class identification in 33 countries. … I explore the extent to which absolute mobility, in conjunction with economic development and income inequality, influences class identification.
Research result:
I find that both respondent’s and father’s social class position strongly influence how people perceive their fit in the class structure. … it affects people in all class positions equally — i.e., class socialization is equally strong for all social classes.
I demonstrate that national-level absolute mobility does not influence class identification. … I also confirm the importance of income inequality in shaping class identification and awareness. This new evidence suggests that class identities are more pronounced when inequality is low rather than when it is high.
General result:
My results provide strong evidence to suggest that people understand their position in the class structure, and that inequality significantly affects this relationship.
Research contribution:
This is the first study to systematically explore the effect of both respondent’s occupational class and their class origin on class identification.

References

Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.