1 Institute of Biotechnology and Health, Beijing Academy of Science and Technology, 10094 Beijing, China
2 Department of Nutrition and Food Hygiene, School of Public Health, Southeast University, 210009 Nanjing, Jiangsu, China
3 Key Laboratory of Environmental Medicine and Engineering, Ministry of Education, 210009 Nanjing, Jiangsu, China
4 National Institute of Nutrition and Health, Chinese Center for Disease Control and Prevention, 100050 Beijing, China
5 Glycemic Index Research Unit, School of Applied Science, Temasek Polytechnic, 529757 Singapore, Singapore
6 Division of Nutrition, Department of Pediatrics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, 10400 Bangkok, Thailand
7 Department of Community Nutrition, Faculty of Human Ecology, IPB University, 16680 Bogor, Indonesia
8 Nutrition Society of Malaysia, 46150 Petaling Jaya, Selangor DE, Malaysia
†These authors contributed equally.
Abstract
Health and function claims are central to advancing nutrition science and regulatory practice, particularly across Asia, where the harmonization of nutrition labeling standards faces unique and complex challenges. Fragmented national regulatory frameworks, combined with the region’s diverse range of food ingredients and traditional herbal products, have created substantial inconsistencies in scientific evaluation practices and cross-border trade.
To establish a harmonized, evidence-based technical framework for evaluating the scientific evidence underpinning health and function claims for foods and food ingredients.
The Task Force on Health & Function Claims of the Federation of Asian Nutrition Societies (FANS) developed this guideline through a structured process of expert consultation, iterative peer review, and formal consensus building. The framework comprises six core components: (1) core principles; (2) standardized procedures for evidence evaluation; (3) criteria for assessing literature quality; (4) criteria for grading evidence strength; (5) requirements for evidence report preparation and (6) evidence-based recommendations.
This guideline defines standardized terminology for health and function claims, establishes core principles for evidence-based nutrition practice, specifies the requirement of qualified evaluator, details comprehensive and replicable literature evaluation procedures, and adopts a four-tiered evidence strength grading system (Grades A–D). It prescribes transparent, verifiable pathways for claim substantiation and periodic re-evaluation to uphold scientific rigor and ensure consumer protection.
This guideline aligns with prevailing international standards while being specifically adapted to the Asian regional context. It provides a unified technical framework for the evaluation of health and function claims in Asia. It is designed to facilitate regulatory convergence across the region, safeguard consumer interests, and foster responsible innovation in the functional food sector, while laying a robust foundation for collaborative regional nutritional research and regulatory alignment.
Keywords
- scientific evidence evaluation
- health and function claims
- evidence-based nutrition
- evidence strength
Asia’s functional food market has experienced exponential growth, driven by rising prevalence rates of diet-related non-communicable diseases (NCDs) including cardiovascular disease, diabetes, and obesity. These diseases, exacerbated by widespread dietary shifts toward nutrient-poor food patterns, place a substantial burden on public health systems and healthcare infrastructure across the region. Despite this urgent public health need, the establishment of a standardized and robust regulatory framework for functional food across Asia remains a critical challenge. Fragmented national regulatory approaches, regulatory ambiguity, and inconsistent definitions of key terms hinder both scientific innovation and cross-border trade. These challenges are further compounded by public skepticism toward conflicting health claims and insufficient transparency in functional food labeling. Such governance gaps reflect broader global concerns regarding nutrition misinformation and misaligned policy implementation, both of which erode consumer trust and negatively impact evidence-based dietary practices [1].
In response to these issues, the Task Force for Nutrition & Health Claim of the Federation of Asian Nutrition Societies (FANS) developed this technical guideline, which was presented and deliberated at the 2025 International Medical Innovation and Cooperation Forum in Fangchenggang, Guangxi, China, to gather stakeholder input and advance the harmonization of regional regulatory practices. As shown in Fig. 1, the guideline has four core objectives: (1) to harmonize regional evaluation frameworks by establishing criteria aligned with global regulatory models [2, 3, 4, 5, 6] while customizing methodological practices for the Asian context; (2) to empower scientific innovation by establishing a transparent, rigorous pathway for validating health and function claims of foods and food ingredients through standardized scientific evidence evaluation [7, 8]; (3) to protect consumers by enhancing accountability and transparency in health and function claims, ensuring that markedted products deliver measurable, evidence-based health benefits; and (4) to facilitate cross-border trade by promoting consistency in the regulation of health and function claims among Asian countries, providing a shared evaluation reference for cross-border markets, and fostering regional economic integration and innovation.
Fig. 1.
Core objectives and framework of the FANS consensus guideline. FANS, Federation of Asian Nutrition Societies.
This technical guideline reflects the FANS’ commitment to advancing evidence-based nutrition policy across Asia. It establishes harmonized criteria for evaluating health and function claims, integrating the GRADE methodology [9], evidence-based review conducted by Chinese Nutrition Society [10] and consensus input from FANS member countries. By clarifying six core components (Fig. 1) for evidence evaluation, the guideline offers a blueprint for advancing evidence-based functional food regulations across Asia. We anticipate that this work can provide a foundational reference for the gradual convergence of nutrition research and regulatory development across Asian countries. Ultimately, this collaborative initiative seeks to strengthen public health interventions for the regional reduction of NCDs.
The development of these guidelines was initiated and led by the Health Claims Working Group, one of four specialist task forces established by the Executive Committee of the FANS in April 2024. The working group comprises core members from FANS member societies, with collective expertise spanning nutrition science, regulatory affairs, evidence-based methodology, and clinical nutrition.
The guidelines were developed through a structured, multi-stage consensus-building process aligned with international best practices for evidence-based guideline development, with key milestones outlined below:
Phase 1: Task force establishment and scope definition. The FANS Executive Committee formally established the Health Claims Working Group, chaired by Prof. YY and Dr. EST, to address the unmet need for harmonized health and function claim evaluation across Asia. The working group first defined the guideline’s core scope, primary objectives, and target end users (regulatory bodies, academic researchers, and the food industry) through reviewing regional regulatory gaps and established global evaluation frameworks.
Phase 2: Draft development and internal review. The Health Claims Working Group led the drafting of the preliminary guideline version. This version established a quantitative, replicable scoring system comprising with two core level: single literature quality assessment and comprehensive body of evidence evaluation. For single studies, each eligible literature was scored across four dimensions with a maximum total score of 16 points determining the quality rating. The overall body of evidence was evaluated across five elements, each graded as excellent, good, fair or poor. Finally, a four-tiered system (Grades A–D) was applied based on the body of evidence assessment. The draft underwent three rounds of iterative internal review by the working group to refine standardized terminology, methodological details, and Asia-specific contextual considerations.
Phase 3: Multi-stakeholder consultation and consensus finalization. An Advisory Committee comprising distinguished experts from FANS member societies provided strategic guidance on the guideline’s scientific rigor and regulatory alignment, with critical feedback incorporated to strengthen the document’s robustness and regional applicability. The revised draft was presented and discussed at the International Medical Innovation and Cooperation Forum in Fangchenggang, Guangxi, China, to gather feedback from regional stakeholders including academic researchers, regulatory officials, and industry representatives. All comments from the Advisory Committee, consulted experts, and forum stakeholders were systematically synthesized. Disagreements were resolved through structured group deliberation or third-party arbitration by independent experts in nutrition epidemiology. The final version was approved by the FANS Health Claims Working Group and Advisory Committee.
To ensure alignment with international best practices and robust scientific evaluation, the following key references and methodological frameworks were utilized: (1) Codex Alimentarius Guidelines for Use of Nutrition and Health Claims (CAC/GL 23-1997) [11]; (2) WHO GRADE Handbook [9]; and (3) evidence based review conducted by Chinese Nutrition Society [10].
This guideline specifies the core principles, literature retrieval methods, evaluation procedures, evidence strength grading criteria, and evidence-based recommendations for scientific evaluation of the relationship between functional food ingredients and health and function claims. It also outlines the standardized requirements for the format and content of evidence evaluation reports.
The guideline is applicable to the following use cases: (1) evaluating individual ingredient and their proposed functions for inclusion in functional food directories; (2) preparing supporting materials for the regulatory review of novel health and function claims; and (3) developing scientific guidelines or consensus related to research.
The following terms and definitions apply for the purposes of this guideline.
Any representation that states, suggests, or implies a relationship between a food or a constituent of that food and health. Health and function claims include the following three categories:
(1) Nutrient function claims: Nutrition claims that describe the physiological role of a nutrient in the growth, development, and normal functioning of the body.
(2) Maintenance and improvement of health status claims: Claims relating to specific beneficial effects of the consumption of a food or food constituents, within the context of the total diet, on the normal functions or biological activities of the body. Such claims relate to a positive contribution to health, the improvement of physiological function, or the modification or preservation of health status.
(3) Reduction of disease risk claims: Claims that linking the consumption of a food or food constituent, within the context of the total diet, to a reduced risk of developing a disease or health-related condition. Risk reduction is defined as the significant modification of a major risk factor for a disease or health-related condition. Diseases have multiple risk factors, and modifying one of these factors may or may not exert a beneficial effect. The presentation of risk reduction claims must ensure, for example through the use of appropriate language and reference to other contributing risk factors, that consumers do not interpret them as disease prevention claims.
Foods that provide additional health benefits beyond basic nutritional properties. Different countries may refer to them by alternative terms such as “health or food supplements”, “health protection foods”, or “health foods”. Importantly, functional foods must not make claims to prevent, alleviate, treat, or cure any disease, disorder, or specific physiological condition.
Bioactive or functional components in food that exert additional health benefits beyond meeting basic nutritional requirements.
Peer-reviewed academic papers are formally published in recognized academic journals.
A systematic, question-driven process centered on a predefined research question or hypothesis, involving the comprehensive collection and critical evaluation of the latest high-quality evidence to assess the relationship between foods, nutrients, and health outcomes. Decisions are informed by balancing benefits, risks, and resources.
The systematic retrieval and quality assessment of individual literature, conducted in accordance with evidence-based nutrition standards, based on study design and methodological rigor.
A comprehensive collection of all relevant information, including literature and materials from multiple studies and sources, complied following systematic retrieval and strict screening in accordance with standardized methods.
A comprehensive assessment of all included literature to evaluate the sufficiency, consensus, and credibility of the evidence supporting the relationship between a functional food ingredient and a proposed health and function claim.
The systematic collection and screening of relevant literature, evaluation of literature quality and evidence strength, and completion of an evidence evaluation report with accompanying evidence-based recommendations.
Rigorously designed, high-quality human studies shall form the primary basis of the body of evidence. Animal and in vitro studies may only be used as background information to generate research hypotheses or illustrate mechanisms. Unpublished studies shall be limited to use as background and supportive material only. The volume of included literature and the size of study populations must be adequate; the evaluation shall be terminated if sample sizes or literature volume are insufficient to support robust conclusions.
Comprehensive evidence evaluation must be
conducted by qualified professionals. Evaluators must hold a medical or nutrition
background at associate-professor level or above, be fully familiar with GRADE
methodology and this guideline, and ideally have participated in continuous
training. At least two qualified evaluators must independently complete each
evaluation; disagreements between evaluators require resolution through consensus
or third-party arbitration. The evidence report review panel shall consist of
Nutrient function claims do not require full evaluation procedures if evidence of their essentiality and necessity for human health is available from authoritative sources. Full evaluation is mandatory for claims relating to the maintenance and improvement of health status, as well as reduction of disease risk. The standard evaluation procedures are not typically applicable to claims based on traditional culinary herbs. Such claims require distinct evaluation processes based on authorized traditional medicine documents or Pharmacopoeia entries.
Evidence reports and all supporting materials shall be published on publicly accessible platforms for open review and comment.
These procedures apply to evidence evaluation for maintenance and improvement of health status claims and reduction of disease risk claims. They may also be used for traditional culinary herbs where sufficient published literature exists. Evidence reports must be prepared by qualified evaluators in strict accordance with the procedures outlined below.
Based on the objectives, a testable hypothesis regarding the relationship between a specific ingredient and a proposed health benefit shall be defined, alongside the scope of the research and parameters for literature searching. This includes: defining the target population (e.g., age, physiological conditions, geographical context); specifying intervention/exposure details (e.g., ingredient/substance name, source, composition); defining comparison groups (e.g., unexposed populations, placebo, or different dose regimens); and defining primary and secondary outcomes, as well as validated criteria for their assessment.
Based on the predefined scientific questions, evaluators shall determine search keywords and formal inclusion/exclusion criteria, with explicit specification of Population, Intervention/Exposure, Comparison, Outcome, Study design (PICOS) elements [13].
Keywords and search syntax shall be developed in both English and the relevant official national language. Searches must be conducted across at least two of the following databases: PubMed, Web of Science, Cochrane Library, Embase, and Clinical Trial Registration databases. Clear inclusion/exclusion criteria shall be applied to screen for eligible literature [14]. Detailed information on all selected literature shall be presented in tabular format, including title, publication year, study type, methodology, sample size, intervention details (e.g., dose, intervention duration), primary and secondary outcomes, and reported adverse reactions. Full references of included studies shall be appended.
Evidence quality (encompassing both single study quality and the overall quality of the body of evidence) and evidence strength. Comprehensive evaluation criteria are detailed in Appendix A. Fig. 2 summarizes the key steps of evidence evaluation procedure.
Fig. 2.
Evidence evaluation flowchart.
Specifically, single study quality is scored across four core dimensions, with a maximum total score of 16 points: (a) study design type (0–4 points) [15, 16, 17, 18, 19, 20, 21, 22, 23], (b) implementation situation (0–5 points), (c) effect size (0–4 points) [24], and (d) health relevance (1–3 points). The total score for each study is used to assess single literature quality. The overall body of evidence is assessed across five elements: overall literature quality (the average of the sum of single literature quality score across all included studies), consistency, effectiveness rate, population similarity, and applicability. Each element is graded as excellent, good, fair, or poor.) For evidence strength grading, a four-tiered system (Grades A–D) is applied based on the body of evidence assessment, with predefined rules for grade adjustment based on bias risk, effect size, and other modifying factors (Appendix Table 14).
Evidence strength is graded using the adapted GRADE system [25, 26], as follows: Grade A denotes high-level evidence with sufficient data and broad scientific consensus; Grade B represents moderate-level evidence with adequate data and partial consensus; Grade C indicates low-level evidence with limited consensus that requires cautious application; and Grade D signifies insufficient evidence with uncertain conclusions.
The required content, format, and structural requirements for the evidence report are specified in Appendix B. Primary literature and statistical data shall be appended to the report. Additional supportive materials (authoritative books, regulatory documents, product labels) may also be included where relevant.
Based on the evidence report, the review panel shall formulate recommendations, as follows: For maintenance and improvement of health status claims, a minimum evidence strength grade of C is required. For reduction of disease risk claims, a minimum evidence strength grade of B is mandatory. Recommendations must reflect holistic consideration of the scientific evidence and societal factors, including formal assessment of risk of bias, population suitability, and scientific plausibility.
Evaluators bear full responsibility for the integrity and validity of the report. All adverse events reported in included studies must be fully disclosed. All potential conflicts of interest must be explicitly declared.
This guideline focuses on single ingredients or primary functional components of food products. Applicants are responsible for tracking scientific advancements and updating evidence evaluation reports at least every 5 years, with revisions or extensions to claims made as required by new evidence. Where the evidence strength grade for a health and function claim fails to meet the recommended minimum requirements, re-evaluation may be conducted no sooner than 2 years after the initial assessment, contingent on the availability of sufficient new peer-reviewed research literature. This document provides only technical guidance on principles for evidence evaluation, technical procedures, and recommendations and does not represent the final regulatory decision of any government authority.
Across many Asian cultures, foods with functional properties have formed an integral part of dietary and health practices for centuries. Against the backdrop of Asia’s rapidly expanding functional food market and fragmented regulatory landscapes for health and function claims, this guideline establishes a harmonized, evidence-based framework that is tailored to the regional contexts while aligning with prevailing international standards. By defining core principles, literature retrieval methodologies, evaluation procedures, evidence strength grading criteria, standardized requirements for evidence report formats, and evidence-based recommendations for assessing the relationship between functional food ingredients and health and function claims, this guideline addresses the critical unmet need for a unified evaluation system across the region. Furthermore, we have added an illustrative example in Appendix C, which including 10 RCTs in the final evidence evaluation [27, 28, 29, 30, 31, 32, 33, 34, 35, 36], to improve the practical applicability and readability of the guideline.
Central to this framework are three defining strengths that address key gaps in existing regional and global guidance. First, this guideline establishes a quantitative, fully replicable scoring system for literature quality appraisal and evidence strength grading, which delivers clear, operational guidance for regulators, researchers, and industry stakeholders across countries with varying levels of regulatory capacity and maturity in functional food oversight. Second, this guideline was developed through a rigorous, multi-stage consensus-building process with broad representation from FANS member societies across Asia, ensuring inherent regional relevance and acceptability for cross-border implementation. Finally, the guideline incorporates a mandatory periodic re-evaluation mechanism for at least every 5 years, ensuring the framework remains dynamic and adaptable to evolving scientific evidence in nutritional science.
In alignment with core evidentiary principles established in leading global regulatory frameworks, such as European Food Safety Authority (EFSA) and the U.S. Food and Drug Administration (U.S. FDA) [2, 3, 4, 5, 6], this guideline prioritizes high-quality human intervention studies, particularly randomized controlled trials (RCTs) as the gold standard for evidentiary substantiation, mandates systematic literature retrieval and screening, requires transparent assessment of risk of bias, and sets a higher evidence strength threshold for reduction of disease risk claims, consistent with international best practices. However, a key distinction from global frameworks lies in its methodological approach. Whereas existing regulatory guidance primarily outlines overarching principles and key review questions, with final evaluations heavily reliant on the judgment of expert committees, this guideline introduces a standardized, quantitative scoring system for assessing both the quality of single studies and the overall body of evidence. This system provides more granular, operational guidance than the predominantly qualitative frameworks, and is specifically designed to enable consistent implementation across Asian countries with varying levels of regulatory capacity.
We anticipate that this guideline will serve as a cornerstone for regulatory convergence across Asian economies, reducing trade frictions stemming from inconsistent claim evaluation practices and enhancing public trust in functional foods. For academic researchers, the standardized methodological procedures will streamline evidence synthesis and accelerate innovation in nutritional science, particularly by supporting the evaluation of individual ingredients and their proposed health functions, the preparation of materials for novel health and function reviews, and the development of related technical guidelines or scientific consensus documents. For food industry, the transparent pathways for claim substantiation and periodic re-evaluation will foster responsible product development while facilitating market access. For consumers, the guideline’s emphasis on evidence-based transparency and scientific rigor ensures that health and function claims are credible, measurable, and aligned with individual health needs.
Notwithstanding these strengths, the guideline has several notable limitations that must be acknowledged. First, this guideline focuses primarily on single functional ingredients, with limited detailed guidance for the evaluation of complex multi-ingredient formulations, which are increasingly prevalent in the Asian functional food market. Second, the widespread implementation and harmonization of the guideline are dependent on voluntary adoption by national regulatory bodies across Asia, which may lead to variability in practical application across different jurisdictions. Third, the scoring system has limited adaptability for very early-stage research with minimal human study data, which may restrict its utility for exploratory research on novel functional ingredients.
As scientific evidence and regulatory practices evolve, the dynamic and adaptable nature of this guideline will ensure its ongoing relevance. We envision that this framework will facilitate cross-sector collaboration among academia, regulatory bodies, and industry, driving the development of more precise, accessible, and equitable nutritional interventions for Asia’s aging populations. Ultimately, this guideline fulfills FANS’ mission to advance evidence-based nutrition policy, laying a robust foundation for improved public health outcomes, regional economic integration, and a more cohesive functional food ecosystem across Asia.
All core data supporting the findings of this consensus guideline are included within the manuscript and its appendices. The full dataset of the stakeholder consultation and consensus voting process is available from the corresponding author upon reasonable request, subject to confidentiality agreements for stakeholder contributions.
All authors have met the four ICMJE criteria for authorship. The specific contributions are as follows: Conceptualization: YY, EST; Methodology and Data Interpretation: JZ, GS, ZW, CL, KB, NC, HH; Consensus Process Coordination: YY, EST; Draft Writing and Original Draft Preparation: JZ, GS; Critically Review, Editing, and Revision: All authors; Supervision and Final Approval: All authors. All authors have read and approved the final submitted version of the manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.
Not applicable.
Not applicable.
This research received no external funding.
The authors declare no conflicts of interest. Kalpana Bhaskaran and Yuexin Yang are serving as the Editorial Board members of this journal. We declare that Kalpana Bhaskaran and Yuexin Yang had no involvement in the peer review of this article and have no access to information regarding its peer review. Full responsibility for the editorial process for this article was delegated to Torsten Bohn.
Supplementary material associated with this article can be found, in the online version, at https://doi.org/10.31083/IJVNR49566.
A.1 Study Design Types
Different study designs inherently vary in their evidence strength and susceptibility to bias or erroneous conclusions. This standard recognizes only human studies as valid primary evidence, ranked by evidence strength from highest to lowest as follows: systematic reviews, meta-analyses, randomized controlled trials (RCTs), cohort studies, case-control studies, and cross-sectional studies.
Randomized controlled trial (RCT): an interventional study design where participants are randomly assigned to either an experimental or control groups. Interventions and control exposures are administered prospectively, with subsequent comparison of outcomes to assess causal relationships between the intervention and health endpoints [15].
Cohort study: an observational research design with high validity for establishing associations between exposure and health outcomes. It tracks a defined population based on their exposure status (e.g., intake frequency or dosage of a specific ingredient), follows participants over time to observe the development of health outcomes, and employs statistical analysis to assess potential associations between exposure and outcomes. This methodology supports health risk-benefit evaluations of food ingredients [16].
Case-control study: a primarily retrospective methodology comparing individuals with and without a specific health condition (cases and controls, respectively) [17]. It investigates pre-disease exposure to the ingredient of interest, analyzing potential associations between exposure and the development of the health outcome [18].
Cross-sectional study: a snapshot observational investigation conducted at a single defined timepoint. It collects data on ingredient intake and health status within a defined population to analyze contemporaneous associations between exposure and health parameters [19].
Systematic review: a methodology that addresses a predefined research question through standardized literature retrieval, critical appraisal of included original studies, and qualitative synthesis of findings [20]. It integrates results from eligible studies to generate comprehensive, unbiased conclusions [21].
Meta-analysis: a quantitative synthesis of multiple studies included within a systematic review [22, 23]. It employs statistical methods to assess heterogeneity, test consistency across independent studies, and pool effect sizes for combined analysis.
For systematic reviews or meta-analyses, the cited original studies must undergo screening based on the eligibility criteria defined in this evidence evaluation process, followed by quality evaluation of each single study.
A.2 Single Literature Quality Evaluation
A.2.1 Study Design Type Scoring
The scoring criteria are shown in Table 1. Each research study is scored based on its study design, with additional adjustments for sample size and follow-up duration as specified.
| Study design | Score |
| Randomized controlled trial (RCT) | 4 |
| Quasi-randomized controlled trial (e.g., alternate allocation, date-based grouping) | 3 |
| Non-randomized comparative studies (e.g., non-randomized concurrent cohort studies, case-control studies, controlled interrupted time series studies with parallel controls) | 3 |
| Comparative studies without concurrent controls (e.g., historical controls, non-concurrent multi-arm studies, interrupted time series without parallel controls) | 2 |
| Case series (with pre-/post-intervention comparisons) or cross-sectional studies | 1 |
Note: Observational studies with large populations (
A.2.2 Implementation Situation Scoring
Evaluation criteria for research implementation are detailed in Tables 2,3,4,5, categorized by study design. Thresholds for grading may be adjusted based on the characteristics of the health function under evaluation, with formal justification provided for any modifications.
| Evaluation item | Grading criteria | Score |
| Sample size | 1 | |
| 0 | ||
| Blinding | Double-/Triple-blind | 2 |
| Single-blind | 1 | |
| No blinding | 0 | |
| Loss to follow-up rate | 1 | |
| 0 | ||
| Intervention duration | 1 | |
| 0 |
| Evaluation item | Grading criteria | Score |
| Incident cases | 1 | |
| 0 | ||
| Blinding | Blinding applied (e.g., outcome assessors, statisticians) | 1 |
| No blinding | 0 | |
| Loss to follow-up rate | 1 | |
| 0 | ||
| Confounding factors | Controlled (via study design or statistical analysis) | 1 |
| Uncontrolled or key confounders omitted | 0 | |
| Follow-up duration | 1 | |
| 0 |
| Evaluation item | Grading criteria | Score |
| Sample size | 1 | |
| 50–100 cases in the case group | 0.5 | |
| 0 | ||
| Blinding | Blinding applied (e.g., outcome assessors, statisticians) | 1 |
| No blinding | 0 | |
| Case-control matching | Appropriately matched | 1 |
| Unmatched or inappropriate matching | 0 | |
| Confounding factors | Controlled (via study design or statistical analysis) | 1 |
| Uncontrolled or key confounders omitted | 0 | |
| Statistical analysis | All eligible subjects included in the final analysis | 1 |
| Incomplete inclusion of study subjects | 0 |
| Evaluation item | Grading criteria | Score |
| Data source | Clearly defined | 1 |
| Ambiguous | 0 | |
| Sample size | 1 | |
| 1000–5000 participants surveyed | 0.5 | |
| 0 | ||
| Non-response rate | 1 | |
| 0 | ||
| Inclusion/Exclusion criteria | Explicitly documented and applied | 1 |
| Not documented or inconsistently applied | 0 | |
| Confounding factors | Controlled (via study design or statistical analysis) | 1 |
| Uncontrolled or key confounders omitted | 0 |
A.2.3 Effect Size
Effect size is a metric quantifying the magnitude of a health effect observed in a study. It determines whether statistically significant study results hold practical or clinical importance [24]. Clinical relevance is assessed based on changes in health function-related indicators and their established impact on clinical outcomes. Effect size scoring criteria are provided in Table 6.
| Effect size | Score |
| Statistically significant result with full clinical relevance across the entire confidence interval | 4 |
| Statistically significant result, but confidence interval includes values without clinical relevance | 3 |
| Statistically significant result, but no clinical relevance within the confidence interval | 2 |
| Non-statistically significant result, but confidence interval includes values with clinical relevance | 1 |
| Non-statistically significant result with no clinical relevance | 0 |
A.2.4 Health Relevance
For studies related to reduction of disease risk claims, outcome indicators evaluated for effect size are categorized as clinical endpoints, surrogate endpoints, or patient-relevant endpoints (outcomes most critical to patient quality of life). Each study is assigned a health relevance score with formal justification. For maintenance and improvement of health status claims, scores are based on the alignment between the observed effect size and the claimed health benefits. Scoring criteria are detailed in Table 7.
| Health and function claim type | Outcome category | Indicator examples | Score |
| Reduction of disease risk claim (e.g., Reduction of Type 2 Diabetes Risk) | Patient-Relevant Endpoints (Directly impacts quality of life) | Complication incidence rate | 3 |
| Clinical Endpoints (Direct measures of disease occurrence) | Disease incidence rate, blood glucose levels, glycated hemoglobin (HbA1c) levels | 2 | |
| Surrogate Endpoints (Indirect risk markers) | Insulin resistance | 1 | |
| Maintenance and improvement of health status claims (e.g., Maintaining Normal Blood Glucose) | Direct Indicators of the Health Claim (Explicitly linked to the claimed function) | Blood glucose levels | 3 |
| Strongly Related Indicators (Highly associated with the claim) | Insulin levels | 2 | |
| Weakly Related Indicators (Indirect or supplementary markers) | Inflammatory markers, body weight | 1 |
A.2.5 Literature Quality Scoring
The literature quality score for a single study is calculated as:
Literature Quality Score = Study Design Type Score + Implementation Score + Effect Size Score + Health Relevance Score
A.3 Comprehensive Analysis of the Body of Evidence
A.3.1 Literature Quality
The overall literature quality score of body of evidence is calculated as:
Overall Score = (Sum of total scores of all included studies) / (Number of included studies)
The overall literature quality of body of evidence is categorized as Excellent, Good, Fair, or Poor according to Table 8.
| Body of evidence literature quality score | Assessment level |
| 13–16 | Excellent |
| 9–12 | Good |
| 5–8 | Fair |
| 1–4 | Poor |
A.3.2 Consistency
Consistency refers to the degree of agreement among conclusions from all included studies. Grading is performed according to Table 9.
| Consistency of study results | Assessment level |
| All study results are directionally consistent | Excellent |
| Good | |
| 50–70% of study results are directionally consistent | Fair |
| Poor |
A.3.3 Effectiveness Rate
The effectiveness rate is categorized based on the proportion of included studies that demonstrate a statistically significant beneficial effect of the substance or ingredient on the claimed health function, as outlined in Table 10.
| Effectiveness rate | Assessment level |
| 100% of included studies confirm the beneficial health effect of the substance/ingredient | Excellent |
| Good | |
| 50–70% of included studies confirm the beneficial health effect | Fair |
| Poor |
A.3.4 Population Similarity
Population similarity is categorized based on the alignment between the study population and the target population for the claim, as defined in Table 11.
| Alignment between study and target populations | Assessment level |
| Study population fully matches the target population in all key characteristics. | Excellent |
| Study population is similar to the target population, with minor differences in non-critical characteristics. | Good |
| Study population differs in ethnicity but shares key demographic traits (e.g., age) with the target population, allowing cautious extrapolation. | Fair |
| Study population differs significantly from the target population in critical aspects, making extrapolation highly uncertain. | Poor |
A.3.5 Applicability
Applicability is assessed based on whether the findings from all included studies can be generalized to the target population, and the extent of contextual considerations required during application, as defined in Table 12.
| Generalizability of findings | Assessment level |
| Findings are directly applicable to the target population without modifications. | Excellent |
| Applicable to the target population but requires minor contextual considerations. | Good |
| Applicable to the target population but requires significant contextual considerations. | Fair |
| Findings cannot be generalized to the target population due to critical mismatches. | Poor |
A.4 Grade of Evidence Strength
The preliminary grade of evidence strength is determined by evaluating five criteria (literature quality, consistency, effectiveness rate, population similarity, and applicability) according to the standards in Table 13.
| Grade | Criteria |
| A | All five criteria graded as Excellent |
| B | 3–4 criteria graded Excellent/Good, with effectiveness rate graded as |
| C | 1–2 criteria graded Excellent/Good |
| D | No criteria graded Excellent/Good |
The preliminary grade is adjusted based on the factors listed in Table 14 to determine the final evidence strength grade.
| Factor category | Specific modifying factors | Adjustment rules |
| Factors reducing RCT evidence strength | (1) Risk of Bias: Severe/critical flaws in literature quality | Any single factor may lower the grade by 1 level (severe) or 2 levels (critical). |
| (2) Inconsistency: Major, unresolved discrepancies in findings | ||
| (3) Uncertainty: Significant unresolved uncertainties in outcomes | ||
| (4) Imprecision: Incomplete or unreliable data | ||
| (5) Publication Bias: High likelihood of selective reporting of outcomes | ||
| Factors increasing observational study evidence strength | (1) Large Effect Size: Rigorous observational studies show strong, consistent effects | Any single factor may raise the grade by 1 level (e.g., RR |
| (2) Dose-Response Relationship: Clear evidence of a dose-dependent effect | ||
| (3) Negative Bias: Confounding factors are likely to underestimate the observed effect |
A.5 Review Panel Recommendations and Conclusions
The review panel shall issue formal recommendations to maintain, upgrade, or downgrade the assigned grade of evidence strength, supported by documented professional judgment, consideration of societal factors, and explicit rationale for all decisions.
For maintenance and improvement of health status claims, a minimum evidence strength grade of C is required. For reduction of disease risk claims, a minimum evidence strength of grade B must be achieved. Meeting these minimum thresholds confirms that the scientific hypothesis is supported by robust evidence, and the evaluation may be submitted as part of national or regional functional food registration dossiers, or referenced in technical guideline or consensus document development.
The evidence report should include the following structured sections.
Part I: Applicant and Evidence Evaluation Personnel Information
1. Applicant Information
Provide the following details in tabular format: applicant name, tax identification number (national identification number for individual applicants), address, contact number, primary contact person, organization type (enterprise, research institute, individual, etc.), name of the functional food ingredient, recommended intake level, proposed health and function claim, target population, unsuitable population (if relevant).
2. Evidence Evaluator Information
Provide the following details for each evaluator involved in the assessment: name, institutional affiliation, professional title, area of expertise/research field, certificate of completion of evidence evaluation training (if available), role in the evaluation process, and handwritten signature.
Part II: Report Body
1. Research Background and Objectives
(1) Background: Describe the scientific basis for the relationship between the ingredient and the proposed health and function claim, including: ingredient/source characteristics, established physiological mechanisms, existing population studies, and the international regulatory status of similar claims.
(2) Objectives: State the explicit purpose of the evidence report and the scientific and public health necessity of the proposed health and function claim.
2. Methodology and Procedures
The methodology section must include full details of the literature search, screening process, and inclusion/exclusion criteria.
(1) Provide search keywords and syntax in both English and the relevant official national language, including synonyms, alternate terms, truncated/wildcard terms, and related terms. Construct a search syntax using parentheses and Boolean operators (AND, OR, NOT) to define logical relationships between terms.
(2) Specify the databases searched. Report the search syntax and number of results retrieved for each database.
(3) Define explicit inclusion/exclusion criteria for eligible studies in a numbered list format. Provide a flow diagram (e.g., PRISMA-style) illustrating the screening process, changes in study numbers at each stage, and the rationale for exclusions. For multiple publications derived from the same intervention study population, include only one representative publication in the final analysis.
3. Comprehensive Evidence Evaluation
This section must include the following components for the evidence assessment of the “[Substance/Ingredient] for [Health and Function] Claim”:
(1) Literature Summary Table of Body of Evidence: Present a table summarizing all studies included in the body of evidence, with the following details (non-exhaustive): study title, publication year, study type, methodology, sample size, study population/age, intake form, dosage, intervention/observation duration, key outcomes and reported adverse reactions.
(2) Single Literature Quality Scoring: Create a table listing, for each included study: study design type score and rationale, implementation status score and rationale, effect size score and rationale, health relevance score and rationale, and total literature quality score (sum of the above scores).
(3) Body of Evidence Overview and Quality Analysis: Describe the scope and key characteristics of the body of evidence. Calculate the total literature quality score across all included studies, determine the average score, and assign the corresponding assessment level as defined in Table 8.
(4) Integrated Evidence Assessment and Strength Grading:
(a) Preliminary Grade: Assign an initial evidence strength grade based on the assessment of literature quality, consistency, effectiveness rate, population similarity, and applicability.
(b) Final Grade: Adjust the preliminary grade by evaluating the modifying factors outlined in Table 14, with explicit rationale for all adjustments.
(c) Documentation: Record all scores, supporting rationales, and grade adjustments at each step of the assessment.
4. Conclusions and Recommendations
Clearly summarize the key findings: the tested substance/ingredient, target population, effective dosage range, primary health outcomes, overall evidence strength grade, presence or absence of documented adverse events.
5. Reference List
Provide a complete, formatted list of all references cited in the report, in accordance with a recognized academic referencing style.
Part III: Attachments
1. Table of contents and corresponding page numbers.
2. Full texts of key references included in the body of evidence.
3. International regulatory context: relevant regulations/policies from authoritative international bodies (e.g., EFSA, FDA) or systematic reviews on similar health claims.
4. Intake Data: domestic intake surveys and analyses of proposed function food ingredients.
5. Supplementary Supporting Materials:
(1) Safety Data: Toxicological reports or adverse event databases summaries.
(2) Preclinical Studies: Summaries of animal/in vitro studies supporting the proposed mechanisms.
Part IV: Review Panel Recommendations and Conclusions
This section is completed exclusively by the evidence report review panel. It must clearly articulate the panel’s formal viewpoints and recommendations, with a total length not exceeding 2 pages.
Based on the full evidence evaluation report, the assigned evidence strength grade, supporting materials, established scientific theory, and expert consensus, the panel shall formulate a recommended final evidence strength grade and actionable advice for the relationship between the function food ingredient and the proposed health and function claim.
In the supporting rationale, the panel shall explain alignment or discrepancies with the initial evidence strength grade, addressing: (1) evidence validity: literature quality, sample size, mechanistic plausibility, and overall scientific coherence; (2) practical and theoretical significance: relevance to specific population groups, dosage units, and the proposed health and function claims of the ingredient; (3) reasonableness of evaluation metrics applied; (4) adverse reaction records in included literature; (5) adoption status of similar claims by authoritative regulatory agencies in other countries.
The section must conclude with a list of all review panel members’ institutional affiliations, professional titles, and handwritten signatures (minimum 5 members required).
To illustrate the evidence evaluation process in this guideline, we present an assessment for the health and function claim: fructooligosaccharide (FOS) for the maintenance and improvement of normal bowel movement.
1. Formulation of scientific questions
The core scientific question was defined as: Does oral supplementation with FOS (
2. Development of literature search strategy
Systematic search terms for the literature retrieval were developed in both English and Chinese, with terms including fructooligosaccharides, oligofructose, oligosaccharides, constipation, bowel movement, intestinal transit time, defecation, stool consistency, stool frequency, and fecal wet weight. Predefined inclusion and exclusion criteria were established to align with the scientific question, with eligible studies required to (1) focus on the target adult population (aged 18–65 years including healthy adults with or without functional constipation), (2) measure the independent effect of FOS supplementation, (3) use an RCT study design, and (4) report at least one objective bowel function parameter, while studies were excluded if they were (1) conducted in populations with severe underlying disease such as malignant cancer, renal failure, or inflammatory bowel disease, (2) focused on special populations including young children, pregnant women, or adults aged over 70 years, (3) failed to isolate the independent effect of FOS such as multi-ingredient intervention studies without an isolated FOS arm, or (4) consisted of in vitro studies, animal studies, review articles, case reports, or non-peer-reviewed grey literature.
3. Implementation of literature retrieval and screening
A systematic literature search was executed across seven databases, including five English databases (PubMed, Cochrane Library, Embase, OVID, EBSCO) and two Chinese academic databases (CNKI, Wanfang Data), using the predefined search terms and syntax. The literature screening and selection process was conducted in strict adherence to the PRISMA 2020 Statement, with the full workflow detailed in Fig. 3; following duplicate removal, title and abstract screening, and full-text eligibility assessment, a total of 10 RCTs were ultimately included in the final evidence evaluation, with the core characteristics and key findings of these included studies summarized in Supplementary Table 1 (Ref. [27, 28, 29, 30, 31, 32, 33, 34, 35, 36]).
Fig. 3. PRISMA flowchart of study selection. FOS, fructooligosaccharide.
4. Single literature quality scoring
Quality scoring for each included study was performed in strict accordance with the criteria defined in Appendix A of the guideline, assessed across the four core dimensions of Study Design Type Score, Implementation Situation Score, Effect Size Score, and Health Relevance Score. For the implementation situation scoring, the grading threshold for intervention duration was adjusted from the default 2 months to 2 weeks. The reason of the adjustment was that the physiological effects of prebiotic FOS on bowel function have a rapid onset, with measurable changes in transit time, stool frequency, and stool properties observed within 2 weeks of supplementation in the majority of published studies; the 2-month threshold for RCT implementation scoring is designed for chronic disease risk reduction endpoints, and is not appropriate for acute physiological changes in bowel function. Total literature quality scores for the 10 included studies ranged from 9 to 13 out of a maximum 16 points, with 6 studies achieving a total score of
5. Body of Evidence Assessment
The final body of evidence comprises 10 RCTs, including 4 parallel-arm RCTs and 6 cross-over RCTs, with a total of over 700 participants enrolled across the studies. The included studies investigated a wide range of FOS doses (5 g/d to 30 g/d), with intervention durations ranging from 4 days to 8 weeks, and all studies focused on objective, clinically relevant bowel function endpoints in adult populations aligned with the predefined target population.
The five core criteria for evidence strength grading were assessed in accordance with Appendix A, with the following results: (1) Overall literature quality: Good (average score 11.4); (2) Consistency: Good (71.4% of studies measuring the endpoint of stool frequency reported directionally consistent beneficial effects of FOS); (3) Effectiveness Rate: Good (80% of studies reported a statistically significant beneficial effect of FOS on at least one objective bowel function endpoint); (4) Population Similarity: Excellent (the study populations of all included studies fully match the target population of adults aged 18–65 years defined in the scientific question; the single study including participants up to 75 years had a majority of participants within the 18–65 age range, with no impact on overall population alignment); (5) Applicability: Good (while all included studies were conducted in non-Asian populations, the physiological mechanisms of FOS on bowel function are well-established and not population-specific; the findings are directly applicable to the target Asian adult population with only minor contextual considerations). In accordance with Table 13 of the guideline, the preliminary grade of evidence strength is Grade B, based on 1 criterion graded as Excellent, 4 criteria graded as Good, and an effectiveness rate graded as Good.
Modifying factors for evidence strength grading were assessed in accordance with Table 14, with no factors identified that would reduce the evidence strength including no severe risk of bias, major inconsistency in findings, critical imprecision, or high risk of publication bias, and no factors identified that would increase the evidence strength including no extreme large effect size or consistent dose-response relationship across all studies that would warrant grade elevation; as such, no grade adjustment was required, and the final grade of evidence strength was confirmed as Grade B. It should be noted that mild gastrointestinal adverse events (most commonly flatulence and bloating) were reported in several included studies among participants receiving a high dose of FOS (
References
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.


