Background and purpose: Information on topics and knowledge structures is an important indicator of trends, prospects and sustainability in research fields. Although many studies on physical activity (PA) have been published in Korea, no studies have been reported to explain knowledge structure (KS) and keyword topics. Therefore, this study intends to analyze and explain PA-related studies.
Research method: In this study, topic modeling and keyword network analysis were applied to explore the KS of domestic PA-related studies published in domestic journals. 83 journals and 782 studies published from 1996 to June 2019 were collected, and 5441 key research keywords were used as data.
Results and conclusions: Analyzing the study of physical activity in Korea, first, it is a study that reports the PA level of students from an educational point of view. Second, it is a study that verified the validity and reliability of the measurement tool. Third, a study reporting the psychological and behavioral characteristics of PA participants. Fourth, studies promoting PA participation in subjects with disabilities. Fifth, research on key topics of health and obesity is ongoing. This study can be used as basic data to explore the current status of global PA research by providing information on major themes, keywords, and trends of domestic PA research.
In most countries, budgets are invested and various policies are established to promote PA. Awareness to increase this insufficiency of physical activity is growing in importance, and such PA is one of the important factors in preventing chronic disease and promoting sustainable human health . As health and power are intrinsically linked in the Korean society, the health of Korean adults is of national importance . Many previous studies have reported the strong link between PA participation in daily life and the prevention of diseases such as obesity, hypertension, dementia, and poor mental health [3,4]. Specifically, insufficient PA participation has led to socioeconomic problems that constitute the dominant cause of death for more than 3 million people each year and generate 67.5 billion dollars in medical costsper year [5,6].
Due to the importance of PA, a large amount of related research has been continuously published in a variety of disciplines, including medicine, behavior, and social science , since epidemiologic studies into PA were first published in 1953 . PA-related research has increased over the past 20 years, about 425% in 2021 compared to the 1990s . Although the contents of PA research vary, it can be summarized as follows: (1) investigations into the PA level of each country and verification of the relationship between socioeconomic level and PA level; (2) analyses of the correlation and effect between PA and various health-related variables; (3) the introduction of exercise and intervention methods to increase PA; (4) methodological approaches for accurate measurement and evaluation of PA levels; and (5) policy proposals for improving PA participation at national and global levels [7,9,10].
A number of studies have also systematically summarized and described the content of previous studies to identify trends and development levels, providing information such as future research predictions . However, as these previous studies employed a qualitative approach in the form of commentary and review, they have a limited ability to verify the KS of PA studies through quantitative methodology and data analysis [9,10].
KS refers to the information that comprises the knowledge system formed through scientific academic activities from a quantitative and relational perspective. Scientific knowledge generated through systematic academic activities has an original content and structure within each academic field. The characteristic information obtained by integrating and summarizing this knowledge is defined as the KS [11-13]. KS identification is performed in various disciplines as it generate useful data on research trends, detailed research areas, common research findings, and directions for future research [14,15].
Traditionally, the content analysis method has been used to explore the KS . Content analysis categorizes the contents of the research field, subject, object, etc. according to classification criteria selected by the researcher in advance. It employs a methodological system to identify the KS by analyzing the frequency and descriptive statistics of the classified categories . Although the content analysis method has long been used to explore KS, it suffers from a lack of objectivity when classifying and categorizing research contents due to the substantial subjectivity of the researcher .
Recently, knowledge network analysis (KNA), which exploits developments in computer science and statistical methodology, has been proposed as a method of identifying KS using bibliographic data . The KNA methodology generates knowledge information by discovering the pattern, type, and structure of knowledge in the relationship between the knowledge inherent in various bibliographic data (authors, years, keywords, citations, institutions, abstracts, etc.) . KNA has the advantage of statistically structuring the relationship between knowledge through graph theory as well as visually illustrating the structured knowledge using graphs . Specifically, research article keywords selected by the author are the most important type of bibliographic information , which correspond to the knowledge information representing the characteristics of the research. Keywords are highly utilized bibliographic information because it is common to use academically standardized concepts and methodologically defined words. Keyword-based KNA is useful for exploring KS because it forms a network based on the co-occurrence frequency of each keyword . As such, keyword-based KNA has been used to identify the research trends and KS of various disciplines such as business administration, industrial engineering, and health science . In addition, topic modeling is an analytic methodology for generating key information on unstructured text data such as KNA. It is an unsupervised learning-based text mining analysis method that derives key topics by clustering the relationships between texts in a large number of data sets. In particular, topic modeling is a method of analyzing a specific topic by grasping the connection structure of keywords constituting text data simultaneously appearing [21,22]. This is a methodology for creating topics by inferring an implied topic based on the connectivity between multiple texts. Recently, it has been applied in various ways in studies that search for topics based on texts .
Previous analyses of PA-related research using keywords and bibliometrics have provided information on trends, author citations, and popular keywords in studies published in individual journals or internationally recognized databases (e.g., the Web of Science) [7,24,25]. In addition, some studies have aimed to analyze and share the characteristics and trends of PA studies published in different countries [26-28]. A number of PA-related studies have also been conducted in South Korea (Korea). PA participation in Korea is at least 20% lower than the average among OECD countries, with sedentary behavior at more than 30% . Therefore, efforts to promote PA are required. Various studies have attempted to increase PA participation [30-32]. However, no previous research has analyzed the amount of published PA studies, the study topics and content, or the most common areas of research. Moreover, very few studies provide global public knowledge by translating Korean PA studies into English. Therefore, the purpose of this study is to explore the PA-related KS through a network analysis of keyword co-occurrence and topic modeling for studies published in Korea. Specifically, this study aims to provide information on basic data, keywords, changes in keywords over time, and the KS of Korean PA research published between 1996 and June 2019.
Studies published in Korea under the theme of PA were selected as the population, and data were collected through the following procedures and criteria. First, an academic database (DB) was selected for data collection. Selection of the academic DB is an important process for exploring research trends and KS because the searched research differ depending on the year of creation of each DB and the contract with journals. In this study, three academic DBs that provide research in all academic fields were selected to collect PA-related studies in Korea. This was done for the purpose of minimizing the bias of research articles that occur according to the selection of the academic DB. The selected DBs were the Korea Citation Index (KCI) of the National Research Foundation of Korea, which is the national institution of Korean academic research management, the Korean studies Information Service System (KISS) of Korean Studies Information, and the Research Information Sharing Service (RISS) of the Korea Education and Research Information Service.
Second, PA studies were collected from the selected academic DBs by searching for the keywords “physical activity”,“physical” and “activity” or “inactivity” in the title and content. Initially, 1335 published studies were collected from 1996 to June 2019 (KCI = 672, RISS = 272, KISS = 391). A bibliographic list of the research collected from each DB was prepared, and 494 duplicate studies were removed by comparing the lists. We also removed 59 studies that had no author keywords. Through this process, 83 journals, 782 studies, and a total of 14121 keywords were selected as data.
Third, the keywords were translated. Most Korean journals provide keywords in English, where as some journals provide only Korean keywords. Two PhD students from English-speaking countries, one professor in English literature, and several researchers translated the Korean keywords into English. Subsequent back-translation into Korean was also performed.
Fourth, keyword cleansing and standardization were performed. The cleansing process accounts for variations in similar keywords due to singular and plural word use, abbreviations, spacing, parts of speech, etc. If the cleansing process is not performed during keyword analysis, keywords representing the same concept and knowledge are designated as different keywords, which may distort the relationship (similarity) between knowledge. Therefore, a standardization process for synonyms, analogues, and inclusive words was performed. For example, words with similar meanings such as “student”,“alumnus”, and “alum” were standardized as “student” In addition, as the collected data were all PA-related studies, cases with the keyword “PA” were removed. Finally, a total of 5441 keywords (including duplicates) and 1229 individual keywords were selected as the data for this study.
The analysis methods and procedures performed in this study were as follows. First, descriptive statistics were performed to confirm the keyword frequency of the collected data. For descriptive statistics, IBM SPSS 21.0 version (IBM Corp., Armonk, NY, USA) was used. Second, the keyword KNA method was analyzed based on the co-occurrence frequency and relationship of keywords. The KrKwic program was used to calculate the co-occurrence frequency and the matrix . In this case, data with a co-occurrence frequency of more than five keywords were selected for analysis because analyzing the data based on a lower co-occurrence frequency cannot identify the KS contents for the network, leading to distortions in the results. Third, we performed network analysis on the constructed keyword matrix. In order to explore the KS through the keyword-based knowledge network, the influence of the keyword on the network must be calculated. This study selected the degree centrality (DC) as the analytical index to determine the influence of keywords and KS. The DC is an indicator of how many connections (nodes) exist between a keyword and other keywords; for example, a keyword with more nodes in the KS is more influential and is the major area or knowledge of interest being studied in the discipline. Fourth, the LDA (Latent Dirichlet Allocation) algorithm was applied for topic modeling analysis. The LDA algorithm is a method of inferring hidden variables such as context and document structure through observed variables such as documents and words and can determine the subject of the entire document set, the subject ratio for each document, and the probability that each word will be included in each subject [21,22]. The topic and conditional probability distribution were calculated by applying the MCMC (Markov Chain Monte Carlo) learning method . The Netminer 4.0 (Cyram Inc, Gyeonggi, Korea) program was used to perform topic modeling and keyword network analysis. As a detailed analysis method, DC was calculated based on the 1-mode network analysis method. A visual representation of the KS was provided by mapping the network.
Finally, the rate of change was calculated based on the DC value in order to identify the trends of keyword changes in different periods. The study period was divided into two: the last 10 years (2011 to 2019) and prior to the last ten years. This was because approximately 75% (589 studies) of data were published during the last decade. Therefore, we compared the rate of change based on the keyword DC from 1996 to 2010 (period 1) and 2011 to June 2019 (period 2). At this time, the ranking of increasing and decreasing keywords was calculated based on the DC difference between period 1 and period 2. The formula for calculating the rate of change through the keyword DC value is:
rate of change = (x_i–x_1)/x_1
where x_1 is the keyword DC value calculated in period 1 and xi is the keyword DC value calculated in period 2.
The results for top 30 keywords among 1229 individual keywords and 5441 total keywords are shown in Table 1. The keyword with the highest frequency was “Health”, appearing 128 times and accounting for 2.23%. The second most frequent keyword was “Obesity”, appearing 111 times (2.04%). Thus, Korean PA studies are predominantly focused on “Health” and “Obesity”. The next most frequent keywords were “Education” (78 times, 1.36%), “Behavior” (74 times, 1.36%), and “Fitness” (69 times, 1.27%). The sum of the frequency of the top 30 keywords was 1601, accounting for approximately 30% of all 5441 keywords. In other words, out of 1229 individual keywords, these 30 keywords were used repeatedly in approximately 30% of Korean PA studies.
Table 2 shows the results of applying topic modeling. Five topics and the top ten keywords appeared. Looking at Table 2, the keywords of “Health, Elderly, Promotion, Behavior, Exercise, Obesity, Sedentary, Quality of Life, Disease” were ranked based on the results of the topic modeling probability distribution as the first topic. It was confirmed that this was composed of keywords related to health and exercise for the elderly and obesity-related diseases.
|Keyword (probability distribution)|
|Rank||Topic 1||Topic 2||Topic 3||Topic 4||Topic 5|
In the second topic, keywords of “validity, accelerometer, actigraph, adult, metabolism, measurement” were ranked, and it could be confirmed that keywords related to the validity of studies measuring physical activity in adults. In the third topic, the keyword “education, school, disability, children, program, student, class, elementary school” was ranked. It is confirmed that it is composed of keywords related to the application of PA as a class education program at school for general children and children with disabilities. In the fourth topic, keywords such as “life, leisure, time, stress, satisfaction, psychology, and sleep” were ranked. It was confirmed that this was composed of keywords related to the psychological influence obtained by using PA in leisure time. In the fifth topic, keywords of “self, effect, social, support, psychology, concept, and action” were ranked. It was confirmed that this was composed of keywords related to self-directed PA and social support.
The results of the most influential keywords among Korean PA studies are shown in Table 3. Based on the DC value, “Health” was the most influential keyword among the top 30 keywords, followed by “Obesity”, then “Education”, “Fitness”, and “Behavior”. Although there are some differences in the order of DC rankings, the results are similar to Table 1, which shows the keyword frequency.
|Rank||Keywords||Degree centrality||Rank||Keywords||Degree centrality|
Fig. 1 shows the results of mapping the knowledge structure. Many links connect the keywords (nodes) “Health” and “Obesity”, which are highly relevant keywords. Many keywords are linked to the top 10 keywords according to DC value. In the upper left corner, there are a number of related keywords centered on “Psychology”, and “Disability” in the upper center part, “Education” in the top right, “Validity” in the bottom left, and “Behavior” and “Social” in the bottom right are linked to co-occurrence keywords with high relevancy. The keywords “Health”,“Obesity”,“Fitness”, and “Exercise” are linked to a number of keywords in the center.
Keyword knowledge structure map of Korean PA studies.
The top five keywords with a large difference in DC values between the two periods are shown in Table 4. The keyword whose number of appearances increased the most in period 2 (2011 to 2019) compared to period 1 (1996 to 2010) was “Sedentary”, representing a change of 181.9%. In addition, the keyword “Environment” increased by 132.6%. Keywords including “Leisure”, “Disability”, and “Ecological” were also identified as increasing in popularity.
|Increasing popularity keywords||1||Sedentary||181.93|
|Decreasing popularity keywords||1||Exercise||–87.34|
|NOTE: RC*, rate of change of conection centrality based on the study period.|
The keyword whose number of appearances decreased the most was “Exercise”, representing a change of 87.3%, followed by “Psychology”,“Questionnaire”, “Fitness”, and “Promotion”.
To our knowledge, this is the first study to address PA-related studies in Korea. This study reveals important information on major PA topics based on keywords and publication status by year and by journal, based on 782 studies and 83 journals covering a 23-year period since 1996. This study is academically significant because it uses the knowledge network methodology to map the KS of 1229 individual keywords and 5441 keywords in total and explore the trends of keyword changes with time.
The keywords “Health” and “Obesity” showed the highest frequency, indicating that PA had a high correlation with various health variables, and most studies analyzed the correlation between PA level and health-related variables in Koreans. Although the subject of PA-related research around the world is diverse, it can be concluded that it is also being conducted in Korea based on the results of previous studies [7,25,28] that reported the highest classification of health outcomes. Most PAs study in terms of health. In addition, “disability”, “student” and “child” appeared as the most frequent keywords in the top 30. Specifically, as a result of analyzing barriers to participation in physical activity in previous studies related to physical activity of the disabled, the keyword used was “transportation”, which was found to be the same as “individual” and “economic” . Studies on test tools for measuring physical activity of children in general and children with disabilities have been reported, emphasizing the importance of physical activity in children [34-36]. This shows that a lot of PA-related research on the disabled, students, and children is being done in Korea.
According to the KS of PA studies revealed by calculating the co-occurrence frequency and DC value of the keywords, the high frequency keywords “Health” and “Obesity” were ranked highly. This is because the DC value is calculated based on the co-occurrence frequency. The keywords of the top 30 DC values represent the academic characteristics of PA-related studies in Korea. Specifically, these keywords are linked to various keywords and expressed in the form of clusters that represent the Korean knowledge structure. Thus, the KS of PA research in Korea is based on health-related variables (health, obesity, fitness, and exercise) and can be divided into research areas related to (1) the educational perspective of students, (2) verification of the validity and reliability of the measurement tool, (3) psychological characteristics, (4) promoting PA participation amongst the disabled, and (5) social behavior and policy. Although it is not possible to directly compare these results with previous studies, the results of this study are similar to the trend analyses of previous PA studies [7,9].
According to the increasing and decreasing popularity of keywords identified using DC value trends with time, “Sedentary”,“Environment”, and “Leisure” are newer keywords. This agrees with the recent appearance of studies showing that the determinants of PA participation worldwide are highly related to sedentary lifestyles and environmental factors, a research trend that is expected to continue . On the contrary, studies examining exercise, physical fitness, and the quality of analysis tools have decreased relative to other topics; this change in research trends could be explored in further research.
This study has some limitations related to the presentation of basic information in that only keywords found in bibliographic data were used. Specifically, the research results should be interpreted carefully because only the keywords with a co-occurrence frequency of more than five were selected as data to calculate the DC value among the various indicators of the network analysis. Moreover, it is not possible to directly compare our results with those of specific studies due to the lack of previous studies with a similar study design. Therefore, this discussion only involves an indirect comparison of PA-related research trends. Nevertheless, this study provides basic information on the publications, key applied keywords, knowledge structure, and keyword trends of PA-related studies in Korea. The results of this study can provide useful insights for future PA research.
This study is the first to compile and present publication information of PA-related studies published in Korean. In addition, quantitative information on Korean PA research is disclosed through keyword-based network analysis and topic modeling. Key Korean PA research areas predominantly include education, psychology, social and policy aspects, the quality of measurement tools, and the disabled, with a focus on health and obesity. In particular, research into sedentary lifestyles and the environment and ecology of PA participation has been identified as a recent trend. The results of this study has various research applications as it provides basic information on the characteristics of Korean PA studies.
KS, Knowledge Structure; Korea, South Korea; PA, Physical Activity; KNA, knowledge network analysis; DB, database; DC, degree centrality; KCI, Korea Citation Index; KISS, Korean studies Information Service System; RISS, Research Information Sharing Service.
Study conduct: CHC, HRK, and SEL. Data collection: CHC, HRK, and SEL. Data analysis: CHC, HRK. Data interpretation: CHC, HRK, and SEL. Drafting manuscript: CHC. Revising the manuscript content: CHC, HRK, and SEL. All authors have read and agreed to the published version of the manuscript.
Those data are public and available and thus there is no need of ethical approvals and consent to participate in this study.
Thanks to all the peer reviewers for their opinions and suggestions.
This research received no external funding.
The authors declare no conflict of interest.