Measuring productivity is hard, especially for knowledge workers and craftspeople who are building software products.
People are typically the most expensive asset for technology organizations, yet it feels like the internal systems for defining and measuring productivity are archaic and don’t match the methodical approach we use for external product development (iterative, data+gut) in customer facing products.
As more of our work happens in the cloud (and distributed) we are now able to actually record contribution and interaction in a way that was impossible even just a few years ago.
This post is not meant to be an ‘answer’ but a set of early thoughts (from someone who runs a distributed team at scale). I’m excited to see how our work evolves, and how our tools evolve to power more productive global and distributed work.
- How do we define “productive”?
- How do we think about this at both an individual and a team level?
- How do popularity and personal (and cultural) bias factor into this?
- How frequently / over what time period should we measure productivity? (Creative work happens in phases)
These are hard questions, and I don’t have clear answers, and I think there is room for improvement across the industry for in-person, hybrid, and fully distributed teams to define and measure all of these better.
In my mind, productivity is a function of volume, quality, and impact. It’s easy to say this, but these are all hard to define, track, and attribute which is why this is still an unsolved problem.
The most productive teams can have no outstandingly productive individuals, and the least productive teams could have exceptionally productive individuals. It’s hard to actually parse individual productivity when running a larger organization.
As managers of managers, we don’t have sufficient tools to be able to understand most of this and most of our information is driven by sentiment and infrequent interactions with individuals on our teams.
For in person teams, productivity is often a function of time spent in the office combined with their team’s sentiment/perception of their work (and likeability) which is imprecise, and polluted by biases and proxies for actual productivity.
To measure productivity of teams and individuals, we should look to both quantitative and qualitative measures. I prefer measures that apply to individuals but can also roll up to both teams and projects.
Quantitative metrics to measure productivity can be hard to pick, measure and implement in the organization. These types of metrics may make employees feel ‘watched’, don’t tell the full story, and could encourage suboptimal behavior to gamify metrics.
Defining quantitative metrics that can roll up from individuals to team level and project level metrics are best, because it allows us to have the most flexible view of contribution and productivity, but these metrics are hard to select.
Here are a few ideas:
- Volume: A measure of activity across different tools (e.g. Slack messages, GitHub commits, words on internal Wikis). These metrics will vary by team and role, and need to be measurable across tools. I don’t think more activity necessarily implies more productivity but an absence of activity likely implies a lack of productivity.
- Quality: Quality is hard to measure. There are some heuristics we could use like rejected pull requests, bugs, live issues, comments/likes on internal memos. I don’t have a clear view if these would actually work at an atomic level without running some experiments to figure out if they are actually predictive of quality.
- Impact: First we need to define impact (e.g. OKRs, trackable metrics). What business or user goal are we trying to achieve? How do we attribute work directly to these goals, especially at the individual level? It’s easier to do at a project or team level, but still difficult to map work to impact accurately.
- Micro-contracts: I generally work using a series of micro contracts. I commit to a specific scope and timeframe and it’s usually an agreement between me and one other person. It would be valuable to track these micro contracts in a transparent, two player view (vs. separate todo lists) that integrates nicely with my existing workflow (e.g. a Slackbot). This is a feature, not a product, but it would be useful to capture data (e.g. missed, delayed, complete) on this over time across the organization.
- Engagement: We have an internal tool called P2 at Automattic (replaces email) which we use to write and store long form content and share it transparently with our colleagues. Much of my work as a manager is engaging with these P2 posts, leaving likes, comments and asking questions. We track some of this data internally, but don’t have a great view for managers to look at aggregate views for individuals, teams or projects (which would be helpful).
- Iteration Speed: I strongly believe that speed of iteration is a sustainable advantage for companies, and many companies lose this speed as they grow. Something which captures progress towards stated project goals, as well as the project state over time, would be very helpful especially when combined with some of the other individual metrics listed above.
I’d start by tracking all these metrics individually (and creating a time series for each) and then worry about creating compound metrics later, for the metrics that are most accepted by the organization.
Qualitative feedback is important both as a source of positive and developmental feedback for individuals and teams. This feedback contains the most bias, but is the most widely used and accepted form of assessment across our industry.
- Peer Review: This provides a good view of the person’s capabilities as a team player, and a measure of their ‘popularity’ with their peers. It’s also a good measure of team fit as different teams can have different microcultures.
- Manager Review: Managers typically write regular (e.g. annual) reviews which are a useful point in time synthesis but have recency bias (something of which I’m certainly guilty). Managers don’t record all the ‘micro feedback’ and ‘micro improvements’ in a transparent way to the organization (e.g. logged in a database) which I think could be interesting to try.
- Direct Report Review: This is similar to manager reviews, except it’s many to one for the manager. This is useful to check for consistency of feedback, which can be a stronger signal than an isolated piece of feedback which we could over-react to if particularly positive or negative.
- Self Review: This is an opportunity for individuals to synthesize what they think is most important and impactful retroactively. I personally find it useful, and think it’s also useful to kick off a conversation with a direct report (when there is discrepancy in perception). It’s also a good measure of self awareness (if self view is very different from the others).
In general, storing micro feedback (e.g. positive/developmental plus a string), and periodic summaries (discrepancies, sentiment by group, change over time) would be a good place to start, and then HR teams could perform analysis on the usefulness and predictiveness of each of these inputs over time.
Both this qualitative and quantitative data would be helpful to both the individual and the manager to track their development and journey during their tenure at the organization. It would also make manager handoffs (team switches or people leaving the company) cleaner and easier without losing valuable organizational data.
A couple of additional open questions that I did not have good answers to:
- Meetings: I almost added meeting time/number of meetings to the quantitative metrics section, but I’m in two minds about them. I think recurring meetings are a ‘lazy’ way to have individuals and groups get together and make up an agenda. Large meetings with lots of passive meetings are also very ‘expensive’ for the organization, and likely better handled via written, asynchronous alternatives. I do think there are a few exceptions (1x1s for direct reports, project kick off, strategy/direction changes) which have their place to drive alignment and build relationships. I personally like short interactions about specific problems spun up in real time but these are hard to coordinate if teams are over-scheduled.
- Real Time vs. Synthesized: For all of these metrics, how much should available in real time vs. point in time? Real time plus history allows for better trend analysis, but point in time allows for synthesis. Point in time reviews suffer from recency bias and most creative work happens in phases (with natural spikes and troughs).
- Transparency: For all the quantitative and qualitative metrics how transparent should this be to the individuals affected and the organization? If totally transparent, I worry about gamification but if totally hidden it feels like spying. I think it should be run as a contained experiment to see the impact on the culture.
- Popularity vs. Objective Metrics: As humans, how do we eliminate bias? We are often biased towards people who we perceive to be similar to ourselves. We are more likely to excuse poor performance for someone we like v.s. someone we don’t. Measuring and accounting for bias is extremely difficult.
After writing this, I was left with more questions than answers, but feel confident that most organizations (distributed or hybrid) are not good at measuring and storing performance over time at the individual, team, or project level. I also think that the appropriate solution(s) will vary by company and its culture. Building tools to track performance and creating a culture of internal experimentation is necessary for organizations to get to better solutions across all levels.
I’m personally excited about innovation in this space as a manager, and an increased focus on output over facetime and popularity as companies work in a more distributed or hybrid work environment.
As a side note, this was an interesting view on how the folks at GitLab think about this — I appreciate them being open with sharing a lot of their practices in general.