ExpectedReturns GSoC Project

Building a fundamentals-driven factor research workflow inside the ExpectedReturns R package. Open source through Google.

Posted Aug 1, 2025 Updated Sep 15, 2025

By Al Pakrosnis

2 min read

——– STILL WIP ——–

Project Snapshot

During Google Summer of Code I took the ExpectedReturns package from a collection of academic replications to a quant-ready research environment. The focus was on building trustworthy point-in-time fundamentals, converting them into reusable factor functions, and scaffolding a framework that portfolio researchers can immediately iterate on.

Engineering Point-in-Time Fundamentals

A core deliverable was a reproducible pipeline for Microsoft fundamentals. Each parser pulls raw filings via the qkiosk API, enforces point-in-time discipline, converts the results to xts, and persists them for package users. Here is a representative slice from the EPS point-in-time parser:

  
MSFT_epsPIT <- as.data.frame(qk_fn(qk_ticker("MSFT"), "EPS", asfiled = TRUE)[])
MSFT_epsPIT <- na.omit(MSFT_epsPIT[, c("fq", "fpe")])
MSFT_epsPIT$fpe <- as.Date(as.character(MSFT_epsPIT$fpe), "%Y%m%d")
MSFT_epsPIT <- xts(as.numeric(MSFT_epsPIT$fq), order.by = MSFT_epsPIT$fpe)
save(MSFT_epsPIT, file = "data/MSFT_epsPIT.RData")

The same pattern powers additional parsers for market cap, liquidity, momentum, cash flow yield, free cash flow yield, and more (see inst/parsers/MSFT_*). Each dataset ships with matching documentation files (R/MSFT_*.R) so analysts can discover and apply them instantly.

Turning Fundamentals into Signals

Data is only useful once it turns into investable signals. I authored a suite of roxygen-documented helper functions to compute ratios like price-to-earnings, earnings yield, cash-flow yield, and book-to-price. They enforce input validation and are designed to slot directly into backtests. For example, the earnings yield helper validates object types and inverts the PE series:

  
earnings_yield <- function(pe_data){
  if (!is.data.frame(pe_data) && !xts::is.xts(pe_data)) {
    stop("Input must be a data frame or an xts object.")
  }
  1 / pe_data
}

These functions are paired with fetch_price_data utilities to keep factors synchronized with market prices, making the ExpectedReturns package a one-stop shop for fundamentals-driven factor research.

Prototyping AQR-Style Momentum Workflows

Beyond single-factor metrics, I laid the groundwork for multi-factor research. The sandboxed AQR_AMOMX_largeCapMomentum.R script documents the full selection and rebalancing process behind AQR’s large-cap momentum index, giving the team a template for institutional-grade replication work. In parallel I started a generalized factor_framework() scaffold that will ultimately rank securities, break them into long/short sleeves, and produce attribution output for any factor the package emits.

  
factor_framework <- function(returns, factor, cutpoint = .5, longshort = TRUE) {
  if (!is.data.frame(returns) && !xts::is.xts(returns)) {
    stop("Returns input must be a data frame or an xts object.")
  }
  if (!is.data.frame(factor) && !xts::is.xts(factor)) {
    stop("Factor input must be a data frame or an xts object.")
  }
  # ranking, portfolio construction, and performance attribution logic lives here
}

Impact

Quant-grade data discipline: Every factor now rests on point-in-time data, eliminating look-ahead bias for downstream research.
Reusable tooling: Analysts can mix and match parsers, helper functions, and documentation without spelunking through code.
Institutional alignment: The AQR momentum replication and the factor framework map directly onto workflows used by real quant teams.

Add your own anecdote, lessons learned, or favorite debugging battle here.

What I’m Excited to Build Next

Complete the factor_framework() ranking logic and bundle factor-neutral portfolio analytics.
Expand the asset universe beyond MSFT by templating the parser pipeline.
Layer in visualization components (e.g., rolling factor spreads) for quick research readouts.

All the work I did can be found on the main Github for the package here: Github Link