ホーム
研究所について
- 所長挨拶
- 理念と概要
- 組織
- 委員会
- 沿革
- 評価
- 採用情報
- 調達情報
- 情報公開
- 寄附のお願い
- プレスリリース
- 施設紹介
- 創立75周年について
研究活動
- 研究者の紹介
- 研究員・ビジターの受入
  - 統計数理研究所で雇用する特別研究員-PD等の育成方針
  - 外国人ビジター情報
- 研究成果（フリーコンテンツ）
- 本研究所による調査研究
  - 日本人の国民性調査と国際比較調査
  - 当研究所の調査にご協力の皆様へ
共同利用
刊行物案内
- 学術刊行物
- 広報誌
産学連携
プロジェクト
- プロジェクト
- 体験学習プログラム
大学院教育

第3回思考院セミナー

【日時】: 2025年9月12日（金）13:00〜
登録不要・参加無料
【場所】: 統計数理研究所セミナー室 D313・D314
【講演者】: Yuan-chin Ivan Chang (Academia Sinica)
【演題】: Preserving Data Structure in Large-Scale Subsampling by PCA-Guided Quantile Sampling Method
【概要】: In this talk, we introduce Principal Component Analysis-guided Quantile Sampling (PCA-QS), a novel sampling framework designed to preserve both the statistical and geometric structure of large-scale datasets. Unlike conventional PCA, which reduces dimensionality at the cost of interpretability, PCA-QS retains the original feature space while using leading principal components solely to guide a quantile-based stratification scheme. This principled design ensures that sampling remains representative without distorting the underlying data semantics. We establish rigorous theoretical guarantees, deriving convergence rates for empirical quantiles, Kullback–Leibler divergence, and Wasserstein distance, thus quantifying the distributional fidelity of PCA-QS samples. Practical guidelines for selecting the number of principal components, quantile bins, and sampling rates are provided based on these results. Extensive empirical studies on both synthetic and real-world datasets demonstrate that PCA-QS consistently outperforms not only simple random sampling (SRS) but also recent state-of-the-art methods such as coreset and leverage score sampling, yielding better structure preservation and improved downstream model performance. Together, these contributions position PCA-QS as a scalable, interpretable, and theoretically grounded solution for efficient data summarization in modern machine learning workflows.