第1回思考院セミナー/ The 1st Seminar of the School of Statistical Thinking (Hybrid)

【日時】

2026年5月21日（木）15:00〜16:30
参加無料

【場所】

統計数理研究所会議室2 (D208)
(zoomでの参加登録はこちらから)

【講演者】

髙澤祐槻 (東京大学)

【演題】

系統樹サンプルの解析：密度推定と合意樹
Analyzing Samples of Phylogenetic Trees: Density Estimation and Consensus Trees

【概要】

複数の系統樹を要約する問題は、ブートストラップ樹集合の解析、ベイズ推定における事後サンプルの要約、複数の遺伝子樹の比較など、系統学のさまざまな場面で現れる。本セミナーでは、系統樹のサンプルをデータとして扱う解析手法について、樹の分布推定と合意樹の構成に焦点を当てて紹介する。はじめに、配列データに基づく系統樹推定について簡単に概説する。続いて、固定された葉集合をもつ系統樹全体の空間である Billera–Holmes–Vogtmann（BHV）系統樹空間を導入する。この枠組みのもとで、樹の分布をモデル化するいくつかのアプローチを説明し、その一例として、対数凹性を形状制約として用いるノンパラメトリック密度推定に関する我々の研究を取り上げる。対数凹密度は、柔軟なノンパラメトリックモデルでありながら最尤法によって推定可能であるという特徴をもつ。我々の研究は、ユークリッド空間で発展してきたこの考え方を、BHV 系統樹空間へ拡張するものである。
後半では、複数の樹を代表的な一本の樹として要約する合意樹の構成問題を扱う。まず、標準的に用いられる手法を概観し、現代の大規模な系統樹サンプルにおけるその限界について述べる。そのうえで、樹のより細かな違いを捉える非類似度に基づいて合意樹を構成する我々の最近の研究を紹介する。シミュレーションおよび大規模実データ解析を通じて、提案手法が従来手法と同程度またはそれ以上の精度を保ちながら、合意樹の解像度を改善できることを示す。

Summarizing multiple trees arises in many applications of phylogenetics, including the analysis of bootstrap tree sets, Bayesian posterior samples, and collections of gene trees. In this seminar, I will discuss how a sample of phylogenetic trees can be analyzed as data objects, with emphasis on estimating tree distributions and constructing representative consensus trees.
I will first give a brief introduction to phylogenetic tree estimation from sequence alignments. I will then describe the Billera–Holmes–Vogtmann (BHV) space of phylogenetic trees, a geometric space of trees with a fixed leaf set. Within this framework, I will introduce approaches to modeling tree distributions, including our work on nonparametric density estimation using log-concavity as a shape constraint. The class of log-concave densities is a broad nonparametric class that can nevertheless be estimated by maximum likelihood; our work extends this idea from Euclidean space to BHV tree space.
In the second part of the talk, I will turn to the problem of constructing a consensus tree, a single representative tree summarizing a collection of trees. After reviewing standard consensus methods and their limitations for modern large-scale tree samples, I will present our recent work on consensus trees based on fine-grained dissimilarity measures. Through simulations and large-scale real data analyses, we demonstrate that these methods can improve consensus tree resolution while maintaining comparable or better accuracy.