Brian Patrick Kent
  • About
  • Blog
Categories
All (30)
MLOps (2)
NLP (1)
code (8)
data apps (2)
experimentation (3)
forecasting (1)
information extraction (3)
iot (1)
llm usage (1)
management (2)
meta (1)
model evaluation (1)
personal (1)
python (6)
reading (2)
recsys (1)
reinforcement learning (1)
reviews (7)
sql (2)
statistics (1)
survival analysis (7)
talks (1)
time series (1)

Things I read in April

reading
Interesting things I’ve read over the past month. Ramping up on LLM evals.
May 1, 2025
Brian Kent

Things I read while the algae grew in my fur - April 2025

reading
Interesting things I’ve read over the past month.
Apr 8, 2025
Brian Kent

LLM as piano teacher

llm usage
Here’s an AI use case I haven’t seen come up elsewhere: LLM as (beginner) piano teacher.
Mar 12, 2025
Brian Kent

Announcing Apricot, a better way to subscribe to content

personal
NLP
recsys
Social media has become the way to stay in the loop. News and long-form commentary, official government and corporate communications, educational content, professional development—it’s all on social media. Thing is, social media is also destroying lives and tearing society apart. Enter Apricot.
Mar 11, 2023
Brian Kent

Six keys to successful data science management

management
Data science management is a hard job. Here’s what I’ve learned from my years on both sides of the table—manager and individual contributor.
Oct 18, 2022
Brian Kent

My conversation with Manuel Bruscas, VP of Data Analytics

talks
reinforcement learning
MLOps
management
I chatted last week with Manuel Bruscas, VP of Data Analytics at Holaluz, about MLOps, reinforcement learning, and data science management.
Oct 16, 2022
Brian Kent

Making the leap: porting our Python data science blog to Quarto

meta
Quarto is the hot new technical publishing system. Built by the company formerly known as RStudio, I decided to see how it would work for this Python-oriented blog.
Oct 12, 2022
Brian Kent

How to use PyTorch LSTMs for time series regression

code
python
iot
time series
forecasting
Most intros to LSTM models use natural language processing as the motivating application, but LSTMs can be a good option for multivariable time series regression and classification as well. Here’s how to structure the data and model to make it work.
Oct 27, 2021
Brian Kent

Applications of survival analysis (that aren’t clinical research)

survival analysis
Survival analysis has been a standard tool for decades in clinical research, but data scientists in other domains have mostly ignored it. Here are some applications for which you might use survival analysis, to jump-start your creative data science engine.
Aug 4, 2021
Brian Kent

How to compute Kaplan-Meier survival curves in SQL

code
sql
survival analysis
Decision-makers often care how long it takes for important events to happen. In this article, I show how to compute Kaplan-Meier survival curves and Nelson-Aalen cumulative hazard curves directly in SQL, so you can answer time-to-event questions directly in your SQL-based analytics tables and dashboards.
Jul 20, 2021
Brian Kent

How to construct survival tables from duration tables

code
python
survival analysis
The survival table is the workhorse of univariate survival analysis. Previously, I showed to build a duration table from an event log; now I show how to take the next steps, from duration table to survival table and estimates of survival and hazard curves.
Jul 15, 2021
Brian Kent

A review and how-to guide for Microsoft Form Recognizer

reviews
information extraction
Form Recognizer is Microsoft Azure’s answer to Amazon Textract and Google Form Parser for information extraction from form documents. I put all three to the test on a challenging set of invoice documents and was surprised to see Microsoft’s service come up short.
Jul 13, 2021
Brian Kent

A review and how-to guide for Amazon Textract

reviews
information extraction
At just two years old, Amazon Textract is already the grizzled veteran of the fast-moving information extraction-as-a-service market. I put it to the test against Google Form Parser and Microsoft Form Recognizer to see which service has the best speed, accuracy, and developer experience.
Jul 2, 2021
Brian Kent

How to plot survival curves with Plotly and Altair

code
python
survival analysis
Survival curve plots are an essential output of survival analysis. The main Python survival analysis packages show only how to work with Matplotlib and hide plot details inside convenience functions. This article shows how to draw survival curves with two other Python plot libraries, Altair and Plotly.
Jun 29, 2021
Brian Kent

Google Form Parser, a review and how-to

reviews
information extraction
Google Form Parser is a new challenger in the information extraction arena, offering general-purpose off-the-shelf form extraction. We compared Form Parser to Amazon Textract and Microsoft Form Recognizer in terms of accuracy, speed, and ease of use. Take our code snippets and try it yourself!
Jun 15, 2021
Brian Kent

How to build duration tables from event logs with SQL

code
sql
survival analysis
Duration tables are a common input format for survival analysis but they are not trivial to construct. In our last article, we used Python to convert a web browser activity log into a duration table. Event logs are usually stored in databases, however, so in this article we do the same conversion with SQL.
Jun 9, 2021
Brian Kent

How to convert event logs to duration tables for survival analysis

code
python
survival analysis
Survival models describe how much time it takes for some event to occur. This is a natural way to think about many applications but setting up the data can be tricky. In this article, we use Python to turn an event log into a duration table, which is the input format for many survival analysis tools.
Jun 7, 2021
Brian Kent

A checklist for professionalizing machine learning models

MLOps
Data scientists are drawn to the latest and greatest machine learning tasks and models, even though tabular binary classification remains the industry workhorse. We should take more pride in professionalizing the models that we know to work, rather than reflexively chasing every new thing.
May 27, 2021
Brian Kent

Review: Statistical Rethinking, by Richard McElreath

reviews
This is an absolute gem of a book. McElreath has found an elusive combination: Statistical Rethinking is not only one of the best intro textbooks for both causal and Bayesian modeling, it’s also highly readable, even entertaining.
May 21, 2021
Brian Kent

Streamlit review and demo: best of the Python data app tools

python
code
data apps
reviews
Streamlit has quickly become the hot thing in data app frameworks. We put it to the test to see how well it stands up to the hype. Come for the review, stay for the code demo, including detailed examples of Altair plots.
May 12, 2021
Brian Kent

How to analyze a staged rollout experiment

experimentation
python
code
data apps
Recently we argued that confidence intervals are a poor choice for analyzing staged rollout experiments. In this article, we show a better way: a Bayesian approach that gives decision-makers the answers they really need. Check out our interactive Streamlit app first, then the article for the details.
May 7, 2021
Brian Kent

Research digest: what does cross-validation really estimate?

model evaluation
A new paper by Bates, Hastie, and Tibshirani reminds us that estimating a model’s predictive performance is tricky. For linear models at least, cross-validation does not estimate the generalization error of a specific model, as you would assume. How much does this matter for data science in practice?
Apr 26, 2021
Brian Kent

What we’re reading, April 2021 edition

reviews
The data science content firehose can be overwhelming; these are the pieces we think might be worth your time to check out. This month we’re focusing on causal inference.
Apr 7, 2021
Brian Kent

No, your confidence interval is not a worst-case analysis

experimentation
statistics
Confidence intervals are one of the most misunderstood concepts in statistics. Common sense says the lower bound of a confidence interval is a good estimate of the worst-case outcome, but the definition of the confidence interval doesn’t allow us to make this claim. Or does it? Let’s take a look.
Mar 31, 2021
Brian Kent

Choose your experiment platform wisely

experimentation
To build a culture of fast, reliable, evidence-based innovation, you need an experiment platform. These tools support each stage of the experiment process and, done well, become the beating heart of your infrastructure.
Mar 17, 2021
Brian Kent

In defense of statistical modeling

Data science remains hot but there is a persistent stream of articles that says the field is overhyped and that hiring managers and aspiring data scientists should focus more on engineering. Let’s remember why data science’s core skill of statistical modeling is so valuable.
Mar 10, 2021
Brian Kent

What we’re reading: February 22, 2021

reviews
What we’re reading for the week of February 22nd. The data science content firehose can be overwhelming. These are the pieces we think might be a good use of your time to read and study.
Feb 22, 2021
Brian Kent

Data before models, but problem formulation first

Recent tweets highlighted the importance of data annotation and curation in applied machine learning vs. model perfection. Data is indeed critical, but formulating a business problem as a data science task is even more foundational.
Jan 28, 2021
Brian Kent

Conversion rate modeling: worth the effort?

survival analysis
Conversion rates are essential for understanding and optimizing a business. In this article, we compare conversion rate modeling to a common analytics approach and show how to decide between the two methods.
Jan 9, 2021
Brian Kent

Modeling the customer journey

Modeling the customer journey can be one of the best ways for industry data scientists to deliver value. We break the customer journey metaphor down into smaller pieces.
Jan 9, 2021
Brian Kent
No matching items
     
    • Copyright Brian Patrick Kent 2022-2025