武夷山国家公园访客生态系统文化服务感知的多模态评估——基于文本与图像的互补分析

Multimodal assessment of visitor perceptions of cultural ecosystem services in Wuyi Mountain National Park: a complementary analysis based on text and images

  • 摘要: 化解国家公园管理供给与公众需求间的“认知错位”, 关键在于精准把握访客对生态系统文化服务(CES)的感知。然而, 既有研究多依赖单一数据模态, 难以完整捕捉访客体验的多维复合结构。为更全面揭示访客感知, 构建并验证了针对社交媒体“天然非配对”特征的“文本-图像”双视角互补分析框架, 并以武夷山国家公园为例进行应用。该框架分别利用微调来自Transformer的双向编码器表示(BERT)模型对2741条评论文本进行CES多标签分类与情感分析, 同时利用微调ResNet-50模型对3521张访客图像进行景观识别, 并将双模态结果映射至统一的CES分类体系进行宏观模式比对。研究发现:(1)文本与图像在CES表达上呈现显著互补,文本长于抒发内在“精神”体验与“休闲”感受, 图像则偏好记录具象“审美”景观与“科普”场景; (2)整合双视角揭示出访客感知以“美学-休闲-精神”为核心稳定结构, “文化”与“教育”功能感知相对边缘; (3)情感分析进一步精准识别出“价格-服务-排队-交通”四大关联的管理痛点。本框架能系统性生成更全面的访客感知证据, 为国家公园的体验评估与精准管理提供可复制的方法论工具。

     

    Abstract: Bridging the gap between national park management practices and public perceptions hinges on a precise understanding of visitors′ perceptions of cultural ecosystem services (CES). Current research in this domain predominantly relies on single-modal data, which fails to capture the multidimensional nature of visitor experiences. This study develops and validates a dual-perspective analytical framework that integrates textual and visual data. Unlike previous studies that depend on strictly paired data, this framework is specifically designed to accommodate the inherently unpaired nature of user-generated content (UGC) on social media, where texts and images often interpret the same experience from different yet complementary dimensions. Taking Wuyi Mountain National Park as an empirical case, a fine-tuned BERT model is applied to perform multi-label CES classification and sentiment analysis on 2741 textual reviews. A fine-tuned ResNet-50 model is used to define 14 visual categories aligned with the CES typology, identifying landscape elements in 3521 visitor photographs. Analytical results from both modalities are integrated into a unified CES classification framework to enable a macro-level comparison between textual and visual perspectives. There are three core findings. First, texts and images exhibit significant complementarity in representing CES. Texts excel at conveying intrinsic spiritual experiences and recreational sentiments, capturing internal emotional states that are difficult to represent visually. In contrast, images are more adept at documenting tangible aesthetic landscapes and educational scenes, such as geological formations, vegetation, and interpretive facilities. This complementary relationship between implicit perceptions and explicit landscapes suggests that single-modality data can lead to perceptual bias. Relying solely on text risks overlooking visually prominent landscape elements, while analyzing only images tends to underrepresent visitors′ intrinsic experiential values. Second, the integrated dual-perspective analysis reveals that visitor perceptions are dominated by three core dimensions: aesthetic value, recreational value, and spiritual value. In other words, the visitor experience functions as an integrated whole rather than isolated service categories. In comparison, cultural heritage and educational functions remain relatively peripheral and exhibit a pattern of "see more but say less", although images capture educational scenes far more frequently than textual descriptions. Third, sentiment analysis identifies four interrelated management pain points: pricing, service quality, queuing, and transportation. These issues primarily affect recreational and aesthetic services and constitute the main drivers of negative visitor experiences. From a methodological perspective, unpaired social media data are utilized, and a replicable and generalizable analytical framework is constructed, offering more comprehensive empirical support for assessing visitor perceptions in national parks. From a practical standpoint, the hierarchical and complementary structure of CES perceptions are clarified, and management shortcomings are identified through sentiment feedback, providing actionable references for optimizing visitor experiences and addressing operational challenges. Finally, the adaptability of this framework to other protected areas, along with its potential for integration with emerging multimodal large models, is discussed to chart directions for future research.

     

/

返回文章
返回