Posts

Augmented Data Science: Method selection agent

Image
Image – Augmented Data Science: Method selection agent tl;dr (never ai;dr) Method selection is a stage where human judgment matters most: the hard part is often not fitting a model, but choosing a method whose assumptions match the business question and the data-generating process. We built a four-step agent skill ( context gathering , causal, predictive, prescriptive, or descriptive framing , branch-specific clarifying questions , and an assumption-aware evidence table ) to structure that choice. Testing the skill on price elasticity showed how the agent's questions shape the identification strategy. We tested the skill using two recent models: Claude Opus 4.7 emphasized instruments and design choices, while GPT-5.5 emphasized dose-response and machine-learning approaches. The skill provides directions, not decisions. The data scientist must decide which assumptions are plausible and which claims stakeholders actually need...

Augmented Data Science: Hypothesis search agent

Image
Image – Augmented Data Science: Hypothesis search agent tl;dr (never ai;dr) Generating testable hypotheses has mostly relied on the data scientist's experience and a literature review (in line with guidance from the business team). An LLM agent skill can structure and expand that search. We built a three-step skill ( context gathering , causal vs. predictive framing , and evidence-backed hypothesis table ) and tested it on two retail business questions using Claude Opus 4.6 and GPT 5.4. Framing the problem and providing the agent with available variables did most of the work: with minimal context, the causal/predictive distinction produced useful, literature-backed hypotheses. 86% of references checked out; directional claims were reliable, but effect sizes were not. Consistent with prior work, we found that LLMs exaggerate the findings in existing research. This confirms our earlier conclusion: data science teams need to e...

Augmented Data Science: Human Intent, AI Execution

Image
Image: Synergy between human intent and AI execution in data science tl;dr (never ai;dr) LLMs are now part of data science workflows. The question is no longer if we use them, but how and where . We reviewed emerging research to create a framework defining the synergy between human and machine: human intent followed by (and bounding) LLM execution . Effective AI-assisted workflows successfully decouple intent (the "what" and "why" ) from execution (the "how" ). In this model, the data scientist acts as the orchestrator , while the LLM serves as the execution engine . Success hinges on assigning intent to the correct tasks. The data scientist sets goals and validates assumptions . The LLM searches well-defined spaces and executes code subject to human validation. This post sets the stage for a new series where we hope to introduce a task-level guide for data science leaders embedding LLMs int...

Using generative models, well, to generate data

Image
Image: Distributions of the real vs. synthetic data for selected variables tl;dr (never ai;dr) One underappreciated use case of generative models is effectively creating realistic tabular datasets that preserve the underlying statistical properties of the original data. Leading libraries for data synthesis include Synthetic Data Vault, YData-Synthetic, and Synthcity. Practical applications include navigating the bottlenecks of sharing sensitive data with vendors or augmenting datasets for rare events, such as product recalls. Ultimately, this approach enables a data-centric workflow even when data is scarce or biased, ensuring models are trained on a high-fidelity representation of reality. Podcast-style summary by NotebookLM Introduction How can we use generative models beyond ...

Can GenAI accelerate the adoption of optimization?

Image
Image courtesy of Andertoons Podcast-style summary by NotebookLM Solo post: Director's cut The title, "Democratizing Optimization with Generative AI," reflects the approach of a recent paper that investigates why businesses fail to adopt advanced optimization models and assesses whether Generative AI (GenAI) can help bridge that gap. 1 This is interesting. The authors argue that Generative AI (GenAI) can: Offer an intuitive layer to provide visibility into the inputs (Insight), Make the model logic and constraints transparent (Interpretability), and Rapidly respond to change and create what-if scenarios (Interactivity and Improvisation). This is called 4I Framework, and the paper provides a proof of concept from Microsoft’s Cloud Supply Chain team. Most of the insights shared in the paper resonated with my own experience building and deploying optimization models. Optimization has always been too opaque for the teams who would benefit from it the most. Why? Beca...

A look back to look forward: Where do new ideas come from?

Image
Image courtesy of Entertainment Weekly Podcast-style summary by NotebookLM Introduction to the business case This was going to be a solo post, but I bring the Director into the conversation by asking her a question at the end. The inspiration for the article comes from the WSJ piece "Meet the United Airlines Executive Who Gets to Pick Its Hot New Routes" here . Alison Sider, a fellow Longhorn from my alma mater, UT Austin, interviews Patrick Quayle, the senior vice president of global network planning and alliances at United Airlines. She asks interesting questions on a topic of interest. 1   How does United Airlines decide where to fly, and where not to fly?  If a route is a proven cash cow, flying it is an easy decision, but of course, all the competitors are already flying it too. Everyone knows the answer; the only thing left is optimization. Finding a successful route no other competitor has yet flown is a more difficult problem because there's no data, and thi...