Our imagination is stretched to the utmost, not, as in fiction, to imagine things which are not really there, but just to comprehend those things which are there.
– Richard Feynman, The Character of Physical Law (1965)
Re-imagine Your Data: An Organization Health Check Example
As evidence of the value of data-driven decision-making mounts , clients request guidance to map out a strategy to better understand, manage, and leverage the data available to them. Beyond the technical questions, we often find the real challenge is to help companies develop a data culture and a data vision that moves beyond traditional data resources and metrics. To that end, we’ve prepared this series of three blog posts focused on data sources, metrics, and algorithms. We hope these posts will broaden your data horizons and help you imagine new questions you might pose, new insight you might gain, and new stories you might tell, if you had easy access to these kinds of information. As an illustrative example, we describe the data sources, metrics, and algorithms that could be used in an internal “Organizational Health Check” service.
I. Data Sources
All organizations collect data intended to monitor and report their success. All employees have contributed to and engaged with these data at some point… whether filling out application forms and surveys, or entering and retrieving data relevant to their job. What has changed in recent years is the ability to integrate data across all aspects of the organization, gather data and insights from beyond the organization boundaries, and automate many of the data retrieval and summary processes . The outcome is to generate the space and time for decision-makers to integrate data with judgement and intuition. This allows them to pose more interesting questions, gain deeper insights, test assumptions to reduce risk, anticipate emerging opportunities, and tell more compelling stories.
Below we outline some examples of internal, external, and supplemental data resources. As you read through the lists, we challenge you: (1) to think beyond the immediate data you collect – think also of the meta-data (data about data, such as when and where it was collected), (2) to imagine connections or associations among the data that you would wish to explore, if the data were immediately accessible and easily visualized, and (3) to explore how perceptions of data might change if you had not just the data, but also knowledge of the social context and dispersal patterns of that data.
Internal data source examples:
Traditional internal data (HR, marketing, etc.) provide information on employee turnover, sales success, and profit margins. Non-traditional internal data resources provide insight into the social structure and dynamics within the organization, sentiment expressed by different groups within various informal media, and tone of leadership expressed through formal media.
- HR data (retention, performance, promotion, raises, benefits, leave, time-to-hire, demographics, addresses, employee surveys)
- Marketing/Accounting data (sales/services dates, values, and responsible individual/team, customer surveys)
- Memos (date of, sender, recipients, format/access, text content)
- Policy documents (date of, format/access, text content)
- Employee performance reports (date of, reviewer, reviewee, scores, query language, text content)
- State of business reports (date of, department, author, audience, format/access, text content)
- Internal email/message systems (date of, sender, recipient, internal social network structure, text content)
- Databases (date of access, data pulled, data updated)
- Website (date of access, pages visited, information downloaded, text content)
- IT logs (log-ins, activity, security, usage patterns)
External data source example:
External data can serve multiple purposes. Social media data (e.g., Facebook, Twitter, Instagram) provide insight into behavior, beliefs, motivations, and values of “people like ours” generally or by demographic group, or specifically for your employees . Professional media data (e.g., Glassdoor, LinkedIn, ResearchGate) provide an alternate and complementary perspective on individuals’ beliefs, motivations, values as well as the reach of organization’s workforce expertise and reputation. News media or weather data provide insight to events, trends, patterns that could externally influence populations within an organization.
- Social media data (social network structure, text content, preferences, propensities, dates, locations)
- Professional media data (social network structure, text content, dates, locations)
- Blog data (readership network structure, likes, shares, downloads, text content)
- News media data (unanticipated and anticipated high impact event dates, locations, likes, shares, text content)
- Weather data (10 day forecasts, extreme weather events, pollen forecast, flu forecast)
Supplemental data collection:
In some cases, supplemental data collection serves to fill gaps or check assumptions underlying models and analyses. Alternatively, they provide novel insights and can provide the basis for new questions or hypotheses, especially as related to organizational culture and collective behavior.
- Narrative data – Anecdotal narrative fragments scored by the individuals who provide the narrative provide an alternative perspective (or check) of sentiment analysis. The insights are gained where individuals self-score their sentiments differently than algorithms. Clustering of anecdotal experience also provides contrast to clustering performed on demographic or behavioral data. Finally, narrative can identify biases and false hypotheses that may have been built into your analytics and business strategy  .
- Sensor data – With the Internet of Things , sensor data can provide valuable insight into use of equipment and space, as well as personal activity, health, and safety.
II. Data Metrics
The data sources described in Blog 1 have multiple dimensions. Isolating the data into distinct data silos ignores potential relationships among variables. Yet, neither is it adequate to simply merge the data and seek correlations among variables. There are temporal, spatial, contextual elements that encourage the exploration of dynamic, emergent patterns. These patterns could be characteristics of or relationships among entities (individuals, groups, programs, departments, etc.). Characterizing these patterns (or triggers of pattern shifts) in relation to organizational health facilitates early detection of potential opportunities and problems. In the case of an Organizational Health Check, the goal is to strengthen positive patterns and encourage positive change while dampening negative patterns and change.
- Network metrics: number of nodes (individuals), connectedness of individuals and groups, information and opinion flows (direction, volume, speed), influencers vs influenced, geographic (or departmental) distribution
- Cluster metrics: dominant group characteristics or behavior, number of members within group, dissimilarity between groups, similarity within group, correlation (causal or simply probability of joint occurrence), outliers
- Change metrics: transitions, trigger points, thresholds, trajectories, disruption
- Sentiment metrics: beliefs, motivations, values, mood, emotion, judgement
The response of interest for our example, organizational health, would be defined by a set of metrics based on current theories of successful business practices. For example, if innovation is a key characteristic of organizational health, then the objective is to identify factors and patterns that trigger, maintain, or accelerate innovation (or the reverse). This implies that innovation has been defined is such a way that it can be clearly measured as present versus absent or slow versus fast.
Good organizational health metrics accurately describe (past), indicate (present), and sometimes anticipate (future) the status of an organization’s performance. Traditionally, distinct sets of organizational health metrics independently focused separately on various aspects of organizational health, such as employee and leadership culture, financial success, and strategic vision. However, these diverse aspects are closely linked and inter-connected; changes in one aspect generally precipitate change in other aspects of organizational health. Ignoring these linkages limits the power of organizational health checks to the past and present and forces companies to be dependent on assumptions that patterns in the past will effectively prepare them for the future. A quick review of current business literature reveals the danger of such assumptions. Current theory emphasizes complexity, emergence, and disruption of patterns. Maintaining organizational health requires not just knowledge of the past and present, but also understanding of the structure and dynamics that enable the company to innovate, adapt, and maintain organizational health.
Organizations that leverage the technological innovations of Big Data and Artificial Intelligence gain direct insight into the complexity, emergence, and disruption that drive and define organizational health.
Algorithms are problem-solving tools. Multiple algorithms have been developed to handle the large, complex data sets (Blog 1) and metrics (Blog 2) mentioned in this series. When working with Big Data, analysts use algorithms for diverse tasks. Some algorithms calculate derived variables (e.g., a rate of change, diversity classes), some characterize the nature of things (e.g., clustering, grouping), and some define cause and effect relationships . A single analysis might involve all three kinds of tasks. For example, an organization interested in exploring how diversity influences organizational health, could first calculate a diversity index and a health index, then classify time periods or departments according to the health index, and then test if changes in the diversity index positively correlated with changes in the health index.
The choice of a specific algorithm from a set that perform similar functions depends on the data (quality, quantity, relevance, and accessibility) and computing resources. Given suitable data (and the definition of suitable is very broad for Big Data and AI applications), some common procedures and applications include:
- Apply external data to strip out or control for background (external) factors influencing internal behavior and sentiment. More clearly see the patterns within the organization’s direct sphere of influence and/or leverage knowledge of external factors to inform hiring, marketing, or management strategy.
- Identify external and internal factors that commonly correlate with positive metrics (diagnostic) or precede change (anticipatory) to inform organizational strategy.
- Explore differences among demographic groups, departments, or levels within organization to identify patterns of diversity (social and cognitive), coherence, and divergence.
- Enable continuous evolution of correlative relationships (and health assessments) through incremental learning algorithms. Expectations need not be set upon static assumptions of values, beliefs, and motivations or constant values weighting the importance of each data resource.
Machine learning algorithms are those most commonly associated with Big Data analysis. Jason Brownlee  created a mindmap of machine-learning algorithms and offers two perspectives on how someone might think about types of algorithms. First, he distinguishes algorithms by learning style: supervised, semi-supervised, unsupervised. Supervision, in this context, is when the data analyst provides a training dataset to constrain or guide the algorithm’s procedure. For example, if organizational data can be sorted a priori into unhealthy versus healthy periods or practices, algorithms can use distinguishing characteristics of the two groups to classify new data as it arrives. Alternatively, if training data are unavailable or the analyst simply wants to avoid enforcing assumptions associated with a priori labeling, then an unclassified algorithm will seek patterns based solely on the data structure. Many algorithms have supervised or unsupervised variants – so this classification of algorithm types relates to your assumptions about your data and your knowledge of how the system works. Brownlee’s second classification system organizes (and illustrates!) common algorithms into eleven types based on the procedure used by the algorithm to answer a question . Of course, many algorithms defy easy classification because data analysts find ways to mix and match methods as they seek better accuracy and higher precision.
Re-imagine Your Data
Clients often find the prospect of transitioning to data-driven (or, as I prefer, data-informed) operations. Indeed, there can seem to be an overwhelming volume of data resources and diversity of possible algorithms. However, qualified data managers and analysts will be responsible for the methodological details of integrating the data and selecting algorithms. With the above material, our goal has been to broaden your knowledge of what data you have and what new questions you might pose with these data. The questions you can pose to your data relate not just to the measured values – but also their sources, their rates of change, their dispersal as information, and their impacts within and beyond the organization. Furthermore, your data are not constrained to when individuals intersect with your organization, but extend beyond your direct interactions in time and space. A common side-effect of data integration is the automation of routine tasks, such as periodic reporting. At each level of the organization, individuals can spend more time investigating the implications of the data or innovating applications of the data, rather than assembling the data.
 Anderson C. 2015. Creating a Data-Driven Organization: Practical Advice form the Trenches. O’Reilly, Sebastopol, CA. http://shop.oreilly.com/product/0636920035848.do
 Nudging People Towards Latrines. March 11, 2016. http://cognitive-edge.com/blog/nudging-people-towards-latrines/
 Internet of Things: An Executives Organizational Roadmap. January 6, 2016. http://businessintelligence.com/bi-insights/internet-of-things-an-executives-organizational-roadmap/
 Machine Learning from Streaming Data. March 12, 2013. https://blog.bigml.com/2013/03/12/machine-learning-from-streaming-data-two-problems-two-solutions-two-concerns-and-two-lessons/
 A Tour of Machine Learning Algorithms. November 23, 2013. machinglearningmaster.com. http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/