RESEARCH QUESTIONS

What We Need to Prove — Organized by Urgency and Resource Requirements

How to Read This Document

steamHouse has strong design validity — a curriculum grounded in 1,100+ sources, a framework independently validated by multiple external research programs, and eight years of community implementation. What it does not yet have is outcome validity — formal evidence that the system produces the results the design predicts.

The research questions below define exactly what needs to be studied, in what order, and what resources each question requires. They are organized into three tiers: questions addressable with steamHouse's current community and resources, questions requiring modest external partnership, and questions requiring sustained research collaboration.

For potential research partners, this is an invitation. For potential funders, this is a roadmap showing where investment produces the highest-value evidence at each stage.

Tier 1: Testable Now

These questions can be addressed with steamHouse's existing community, tools, and team — with consultation from research advisors on methodology.

1.1 Portfolio Legibility Testing

Question: Can a reviewer unfamiliar with steamHouse parse a credential portfolio in under five minutes and accurately describe what it contains?

Method: Design a prototype portfolio for 3-5 participants at different developmental stages. Present to 15-20 external reviewers (mix of employers, college admissions officers, program directors). Structured interviews and think-aloud protocols to identify confusion points, trust signals, and information gaps.

What it would prove: Whether the portfolio format communicates effectively outside the ecosystem — or whether steamHouse-specific literacy is required to read it.

Resources needed: Portfolio prototype design. Reviewer recruitment. Research advisor for protocol design and analysis. Estimated timeline: 3-4 months.

1.2 Bootstrap Guide Adoption Pilot

Question: Will youth-serving organizations outside steamHouse adopt the Team Playbook and Bootstrap Guides when offered support for integration?

Method: Recruit 3-5 partner organizations (FLL teams, theater groups, sports teams, scout troops) for a one-season pilot. Provide Bootstrap Guides and light mentoring support. Document adoption patterns: what gets used, what gets modified, what gets dropped, and why.

What it would prove: Whether the framework generalizes beyond steamHouse Club's specific context — the core question for Stage 2 of the demand creation strategy.

Resources needed: Partner recruitment. Season-long support. Pre/post documentation. Research advisor for design-based implementation research (DBIR) methodology. Estimated timeline: one activity season (4-6 months).

1.3 Marker Self-Assessment Consistency

Question: Do participants' self-assessments show internal consistency? Do people who rate themselves highly on related markers (e.g., Scout Mindset and Growth Mindset) do so reliably, or is the self-assessment capturing noise?

Method: Analyze existing self-assessment data from the interactive 58-Marker Self-Rating tool on the website. Look for expected correlations between related markers, consistent patterns within marker types (Stars, Lenses, Keys), and response patterns that suggest thoughtful engagement versus random clicking.

What it would prove: Whether the markers, as currently defined, form coherent clusters that match the theoretical framework — a basic validity check that requires no new data collection.

Resources needed: Data export from existing tool. Statistical analysis (factor analysis, internal consistency measures). Research advisor for psychometric methodology. Estimated timeline: 2-3 months.

Tier 2: Requires Modest External Partnership

These questions require research expertise and resources beyond steamHouse's current team — but are feasible with a single academic partner or research consultant.

2.1 Inter-Rater Reliability Study

Question: When multiple trained mentors independently assess the same participants on the same markers, how often do they agree? Does agreement vary by marker type (Stars vs. Lenses vs. Keys)?

Method: Train 6-8 mentors on the assessment framework using calibration exercises. Have pairs of mentors independently assess 15-20 participants across a defined set of markers (8-12 markers representing all three types). Calculate inter-rater reliability coefficients. Analyze by marker type to identify which categories achieve acceptable agreement and which need stronger rubrics.

What it would prove: Whether the verification system can produce consistent results — the most important research gap in the entire project.

Relevant expertise: Psychometrics, educational measurement, assessment design.

Resources needed: Mentor training program development. Assessment scheduling. Psychometric analysis. Estimated timeline: 6-9 months. Estimated cost: $15,000-$30,000 for research consultant or graduate research assistant.

2.2 Developmental Progression Validation

Question: Do the four progression levels (Basic → Applying → Integrating → Teaching) reflect actual developmental trajectories? Are the level descriptions empirically defensible?

Method: Cross-sectional study comparing self-assessments and mentor assessments across age groups (10-12, 12-14, 14-16, 16+). Do older participants consistently score higher on markers where the framework predicts progression? Are there markers where the expected age-related pattern doesn't appear?

What it would prove: Whether the progression framework reflects real development or theoretical assumption — and which markers have the weakest developmental logic.

Relevant expertise: Developmental psychology, adolescent development, stage theory.

Resources needed: Structured assessment across age cohorts. Developmental psychology consultant. Estimated timeline: 6-12 months (can overlap with 2.1).

2.3 Construct Validity for Key Markers

Question: Do steamHouse's markers actually measure what they claim to measure? Does a high rating on "Scout Mindset" correlate with performance on established measures of intellectual humility or actively open-minded thinking?

Method: Select 6-8 high-priority markers. Identify validated instruments from the research literature that measure the same or closely related constructs. Administer both the steamHouse assessment and the established instrument to the same participants. Analyze convergent validity.

What it would prove: Whether steamHouse's markers are measuring real psychological constructs — or whether the behavioral descriptions, however well-written, are capturing something different from what the research literature measures.

Relevant expertise: Psychometrics, personality/character assessment, educational psychology.

Resources needed: Licensed assessment instruments. Participant recruitment. Statistical analysis. Estimated timeline: 6-9 months. Estimated cost: $10,000-$25,000 for instruments and analysis.

Tier 3: Requires Sustained Research Collaboration

These questions require multi-year partnerships with research institutions and represent the long-term evidence base for the credentialing system.

3.1 Longitudinal Outcome Tracking

Question: Do participants with strong steamHouse marker profiles actually demonstrate better outcomes — in college success, career performance, relationship quality, civic engagement, or life satisfaction — than comparable peers?

Method: Prospective cohort study tracking steamHouse participants and matched comparison groups over 5-10 years. Measure marker profiles at baseline and at intervals. Track outcomes across multiple domains. Control for self-selection effects (families who join steamHouse may differ from those who don't in ways that predict outcomes independently).

What it would prove: Whether the markers predict real-world performance — the ultimate validation question and the evidence that would make the credentialing system genuinely valuable to employers and institutions.

Relevant expertise: Longitudinal research design, program evaluation, educational outcomes research.

Resources needed: Multi-year research partnership with a university. IRB approval. Participant tracking infrastructure. Funding for research staff. Estimated timeline: 5+ years. Estimated cost: $100,000+ over the study period.

3.2 Fidelity and Adaptation Research

Question: When other organizations adopt the steamHouse framework, what's core (must be preserved for the system to work) and what's contextual (can be adapted to local conditions)? How do we measure fidelity for a framework designed to be adapted?

Method: Design-Based Implementation Research (DBIR) across multiple adoption sites. Document adaptations. Measure outcomes at each site. Identify which framework elements correlate with positive outcomes (core) and which can vary without affecting results (contextual).

What it would prove: Whether the framework is genuinely replicable — and what "replicable" actually means for a system that is designed to be adapted rather than franchised.

Relevant expertise: Implementation science, design-based research, community-based participatory research. The work of William Penuel (University of Colorado Boulder) on DBIR is directly relevant.

Resources needed: Multi-site implementation. Research team embedded at each site. Multi-year timeline. Estimated cost: $200,000+ depending on number of sites.

3.3 Employer Value Study

Question: Do employers who review steamHouse credential portfolios find them more useful than traditional transcripts, resumes, or other alternative credentials? Would they change hiring behavior based on steamHouse credentials?

Method: Present steamHouse credential portfolios alongside traditional application materials to hiring managers in a structured evaluation exercise. Measure information value, trust, and stated likelihood of influencing hiring decisions. Compare across industries and role types.

What it would prove: Whether the credential format creates actual demand — the final link in the chain from internal value to external recognition.

Relevant expertise: Workforce development research, employer engagement, labor economics.

Resources needed: Portfolio prototypes at multiple levels. Employer recruitment across industries. Research design and analysis. Estimated timeline: 12-18 months. Estimated cost: $30,000-$60,000.

The Research Roadmap

The questions are sequenced for a reason:

Tier 1 questions (testable now) establish baseline validity and identify obvious design problems before investing in larger studies. Tier 2 questions (modest partnership) build the psychometric foundation that makes Tier 3 questions meaningful. Tier 3 questions (sustained collaboration) produce the evidence that changes the credentialing system from "well-designed" to "proven."

The sequencing also maps to the demand creation stages: Tier 1 supports Stages 1-2 (internal value and peer recognition). Tier 2 supports Stage 3 (portfolio legibility). Tier 3 supports Stage 4 (ecosystem integration).

No single study validates the entire system. But each study adds evidence, and the cumulative effect is a credentialing system with progressively stronger empirical grounding.

[Return to the Landscape Brief →] · [Explore the 58 Development Markers →] · [See the Full Research Asks →]