Bimodal Charts and the Myth of the Average in Six Sigma

Most Six Sigma projects begin with a sensible instinct: collect data, summarize it, and improve the process so the mean performance meets the target. That works when the process behaves like a single, stable system. It fails when the data hide two competing realities: two shifts, two machines, two customer segments, two failure modes. In those cases the mean sits in the no man’s land between two peaks. The organization optimizes the average, and nothing important gets better.

I learned this the hard way on a packaging line that sealed medical pouches. The defect rate swung between 0.2 percent and 4 percent depending on the operator. We did what the handbook suggests: checked measurement systems, tightened maintenance schedules, dialed in sealer temperature, then hunted for special causes. A month later, the average defect rate was down a bit and leadership was happy. Operators were not. Certain batches still blew past the limits. Only when we plotted a simple histogram did the truth show up. It was not a fuzzy normal distribution, it was two humps. Operators on line A had one setup ritual and a habit of rethreading film after breaks; line B did something different. The “process” was really two processes that shared a room and a KPI. No amount of mean-centering was going to fix that.

This article is about recognizing those split realities early, how to diagnose them with a bimodal chart and a few companion tools, and what to do in a DMAIC project when the average is the wrong target.

When averages lie

Six Sigma speaks the language of variation: define it, measure it, analyze it, improve it, control it. But there is a quiet assumption under many tools, especially the t-tests, capability indices, and control charts we reach for first. They expect a single underlying distribution for the metric at hand, or at least a distribution that can be made approximately normal through transformation or subgrouping. When the data have multiple modes, that assumption breaks. The mean can be mathematically correct yet operationally useless.

A mixed population can come from dozens of sources, and I have seen each of these create misleading averages:

    Two or more equipment types running the same part, each with its own signature of wear, heat response, and alignment. Human variation where training, shift culture, or standard work differs by crew, even if the SOP says otherwise. Seasonal or environmental effects like humidity that flip the dominant failure mode at certain thresholds. Customer groups with different use cases, where the midline setting pleases no one. Supplier variation that arrives in indistinguishable boxes but behaves differently in process.

On a service desk I supported, first-call resolution hovered around 68 percent, which looked respectable. Drill down, and you found two queues blended into one Kennel of data: simple password resets and gnarly configuration issues. The resets hit 95 percent first-call resolution; the configurations hit 25 percent. Averaging them together obscured the need for a skills-based routing change and a knowledge-base rewrite. We were congratulating ourselves on a meaningless middle.

Reading a bimodal chart without fooling yourself

A bimodal chart is nothing exotic. It is typically a histogram or kernel density plot that shows two peaks. Yet I have watched smart teams overinterpret a hump or ignore one. Small samples and wide bins can conjure or hide modes. Before you declare a process bimodal, run through a quick checkpoint.

    Start with at least 100 to 200 observations if practical. You can see real modes with fewer, but the risk of noise creating false peaks drops as n grows. Vary the histogram bin width. If a “second peak” vanishes with reasonable bin choices, be cautious. A kernel density plot with a bandwidth sensitivity check helps. Stratify by obvious factors: machine, operator, shift, product variant, supplier lot. If splitting by a factor collapses the two peaks into one each, you have your source. Check time order. A run chart can show whether modes alternate (shift effects) or come in blocks (supplier lots). Validate the measurement system. Double peaks sometimes trace to two ways of measuring the same thing, or to instrument drift.

If you have a capability analysis on your dashboard that still assumes normality, stop and replot. A single Cp or Cpk on a bimodal system produces feel-good numbers that understate the tails. Capability by stratum, or a nonparametric yield estimate, tells the truth.

The myth of the average target

I meet well-meaning managers who want the team to “bring the mean onto target.” For a single-mode process, that can be the right next step. For a true mixture, bringing the mean to target might worsen both subpopulations. Imagine a call center with two call types: five-minute password resets and 40-minute complex cases. If you set a flat 15-minute target because it matches the average handle time, both groups lose. Agents will rush the complex cases and frustrate clients, and the easy calls get mired in overprocessing to justify time spent.

A more vivid example: a metal stamping operation produced parts from two dies. Die A yielded parts that ran 10 micrometers undersize, Die B 10 micrometers oversize. The average sat dead on nominal. The quality manager showed a triumphant X-bar chart hugging the target line. The customer’s assembly line still jammed. The bimodal chart told the real story. They needed to center each die to nominal, not brag about the average of their mistakes.

The myth survives because the average is seductive. It simplifies dashboards. It gives a single knob for leadership to turn. It fits a spreadsheet worldview where variance is noise and the mean is the signal. In practice, the signal is often the structure under the mean.

What a bimodal chart suggests in DMAIC

You do not need to throw away Six Sigma when you face a bimodal distribution. You need to adapt the project to the structure you discover.

During Define, be precise about the unit of analysis. Are you improving a single process or a portfolio of similar processes? If your charter says “reduce scrap on Line 2,” confirm whether Line 2 is a monolith or three different machine families under one banner. Voice of the Customer data might mask segmentation, so probe for it. Customers sometimes blend feedback the way we blend data.

In Measure, plan to tag each observation with potential stratification variables. If you do not capture machine ID, operator, time of day, product variant, or supplier lot in the data lake, you lose the ability to see modes. I have watched teams spend weeks doing change detection on anonymous data streams that hid the obvious split.

In Analyze, put a bimodal chart next to your control charts. Use interaction plots and simple side-by-side box plots across strata. Consider finite mixture models if you have the appetite, but do not let math block action. The main goal is to find whether the double peaks tie to identifiable and controllable factors. Sometimes the factor is not on your standard list. In a printing plant, the split tied to ambient dew point. No one had logged it, but maintenance had a wall thermometer from a bygone era. Once we started recording dew point and paper conditioning time, the modes aligned and could be tamed.

In Improve, resist the urge to partially tune each subpopulation until the combined mean looks tight. That choice leaves two broken systems hiding behind a pretty average. Instead, decide whether to segment and optimize each stream separately, or to eliminate one stream by redesign. In food production, we separated allergen and non-allergen runs, with line-clearing protocols that created a clean unimodal process for each. In a claims process, we redesigned the intake form so complex claims were flagged and routed early, and the “easy” claims went to straight-through processing. Average cycle time increased slightly, but SLA compliance for each segment soared.

In Control, keep the segmentation visible. Post-segmentation control charts track stability within each mode. If you collapse them, you will not notice when one stream drifts and the other compensates. Many plants quietly live with a bad machine offset by an overperforming twin, so the consolidated KPI looks fine until the good machine goes down for PM and the other’s defects explode. Explicit stratified monitoring would have forced the discussion earlier.

The trap of capability indices on mixed data

Capability indices like Cp and Cpk assume a single, stable distribution. Suppose your lower spec limit is 9.8, target is 10.0, upper spec is 10.2. Die A: mean 9.99, sigma 0.03. Die B: mean 10.01, sigma 0.03. Each die alone has a perfectly fine Cpk near 1.1 to 1.2, with very few defects. Now imagine Die A drifts to 9.95 and Die B drifts to 10.05. The combined data may still center at 10.00, and an overall Cpk computed naively can look reasonable if you pool the standard deviation. Yet the probability of being out of spec at either tail can double. The mixture distribution fattens the tails in ways classic formulas do not capture. If you must report a single capability figure for a mixed process, caveat it and provide stratum-level indices alongside a true overall yield computed from the empirical distribution.

I once reviewed a supplier who showed an aggregate Cpk of 1.33 on a critical dimension. The receiving plant complained of an uptick in both low and high rejections. The supplier’s histogram was lovely at first glance. Replotted by cavity number, it showed two clusters 0.06 millimeters apart. The combined Cpk was not a lie, but it was about a system that did not exist on the shop floor. The cavity-level Cpk told the actionable truth, and a pin wear issue got fixed in days.

Mixed models are not just manufacturing

Service and software teams face the same pattern dressed in different clothes. A healthcare scheduling group claimed average wait time was 12 days for a specialty clinic. The patient advisory council insisted the wait felt longer. Turned out new-patient appointments averaged 28 days and follow-ups averaged 5. Both numbers felt plausible to staff who mainly handled follow-ups. Patients who could not get an initial diagnosis for a month felt abandoned. Plotting the lead time separately was enough to reset priorities and staffing.

In software telemetry, mean response time can look fine while a chunk of users experience awful delays. If you see a bimodal chart of latency, investigate by region, device, time of day, and specific API calls. A Los Angeles data center rack misconfiguration once created a second latency peak for West Coast mobile users only. The overall mean moved a millisecond. The angry tweets did not average out.

Even in finance, blending products hides risk. An operations team that handles both domestic and cross-border payments will show a neat average settlement time that understates regulatory exposure on the slow leg. Segment it by corridor. If you do not, your risk dashboards comfort you right up to the audit finding.

How to detect hidden segments before they bite

Most organizations do not set out to hide segments. They evolve into it as systems grow. A few practices reduce the odds that you will chase the mean instead of the truth.

    Instrument your process with lightweight stratification variables from the start. Machine ID, operator, time block, product family, supplier lot, and environmental tags like temperature or humidity are cheap data fields that pay for themselves. Standardize metadata in the data warehouse so analysts can easily slice. When tags live in free-text notes, you do not slice, you guess. Build exploratory data analysis into the routine. At project kickoff, insist on histograms, density plots, and run charts before anyone computes a mean. Create habit loops that ask, “If this histogram is bimodal, what plausible splits could explain it?” Make that question part of gate reviews.

Only one of those needs tooling. The rest are cultural. Teams that learn to suspect averages get faster and calmer with gnarly problems.

When the mixture is the point, not the problem

Not every bimodal distribution is a defect. Some strategies rely on dual peaks, and trying to force a single mode would destroy value.

A retailer might run two same-day picking models: hot picks near the front for express orders and deep picks for next-day fill. Cycle time should be bimodal by design. An airline loyalty program can aim for most redemptions to be either nearly free or premium, with few in between. A software platform can intentionally push users into two distinct onboarding tracks: technical and nontechnical. When you see a bimodal chart here, it confirms segmentation is working. The key is to monitor each segment against its own promise, not to compress them into a single dashboard number that punishes both strategies.

Be clear in governance reviews which KPIs are intentionally segmented. Put those segments on the scorecard as separate lines with their own goals and control limits. Leaders who see a lumpy combined chart tend to demand smoothing; leaders who see two healthy lines tend to ask the right questions.

Beware of sample merging in control charts

Classic Shewhart charts, EWMA, and CUSUM all expect data from a stable system. If you merge data from two strata into one chart, you dilute signals, trip false alarms, or both. Two examples illustrate the risk.

On a fill line, one filler head occasionally dribbled and left underfills. The other heads were stable. The combined X-bar chart waved gently within limits. Heads were not equalized, so a signal that would have fired on Head 3’s own chart got washed out by the other seven. The underfill complaints kept trickling in, and the team shrugged at a “stable” chart. A head-level chart would have caught the drift in a single shift.

In a call center, a new cohort of agents came online. Their AHT started high and declined with learning. Seasoned agents were steady. When you threw them together, the control chart flagged frequent out-of-control points. Supervisors, weary of false alarms, started ignoring red dots. They missed a different special cause that hit both cohorts later. Splitting the charts quieted the noise and preserved attention for real problems.

Use stratified control charts as your default when early analysis hints at modes. If you need an overall chart for leadership, make it a yield or defect chart that reflects customer-relevant outcomes, but keep the diagnostic charts by segment for action.

Practical steps when you spot a bimodal chart

If you suspect you are looking at a stale average pasted over two different processes, move fast but methodically. The moves are not complicated, and you can run them within a normal project cadence.

    Name the suspected strata as hypotheses. “A and B machines,” “new and returning patients,” “Region 1 and Region 2.” Replot the data stratified by each hypothesis. If the peaks collapse into one for a stratum, you have a strong lead. Quantify the difference with simple, defensible stats: means, medians, variances, and a nonparametric test if distributions are messy. Decide whether to split the workflow and optimize each, or to eliminate the cause of the split. Both options are valid. Choose based on cost, risk, and strategic fit. Set up ongoing monitoring by segment, including capability or SLA adherence within each.

Teams that practice this five-step loop get quicker at spotting false averages. They also build credibility with operators and customers because fixes map to lived experience.

Case vignette: heating, ventilation, and customer complaints

A commercial HVAC manufacturer saw warranty claims spike each August six sigma and September. The data showed two clusters in compressor failure times: one around 200 hours, another around 2,000 hours. The average was close to 1,100, which told us nothing useful. A bimodal chart made the split visible. Stratifying by bill of materials exposed the culprit. A vendor had started shipping a replacement capacitor with the same part number, same form factor, and a revised dielectric. Units assembled with the new capacitor failed early in high-humidity markets. Field techs wrote “bad compressor” on the paperwork because the unit would not start after cycling. The fix was a supplier corrective action and a field campaign to swap capacitors in certain serial ranges. The compressor warranty and the average failure time were the wrong lenses. The right lens was two distinct failure populations hidden by a shared part number.

That project reminded me of a recurring theme. Bimodality is often a labeling problem as much as a physics problem. If your systems cannot tell one kind of unit from another except by disassembly, you are already paying interest on a data debt.

Training your eye and your culture

The math behind mixture distributions can get sophisticated, and mixture modeling is a good skill to develop. That said, most operational wins come from three habits that do not require advanced statistics.

First, plot your data often, and not just as lines and boxes. Put histograms and density plots on the first page. Invite operators to look at them with you. They notice shapes and will often Learn more tell you the story behind the second hump within minutes.

Second, stop worshiping the mean. Use medians, quantiles, and segment-specific metrics. In software, the 95th percentile latency moves customer sentiment more than the mean. In service, the tail of wait times drives backlash even if the middle looks fine.

Third, make segmentation a default, not an exception. When you present results, say whose reality each number describes. “New patients wait 27 to 31 days, returning patients 3 to 6 days.” That sentence cannot be averaged away, and it prompts better decisions.

A word about power, sample size, and patience

Detecting and acting on bimodality takes enough data to reduce the chance of seeing ghosts. If you have only a handful of points, defer the split decision until you collect more, or use prior knowledge to guide a lightweight test. For example, if two machines have known offsets from past capability runs, assume they are different until proved otherwise. In time-critical settings, I prefer a biased early segmentation with a plan to revisit, rather than months of blended misery.

Also be ready for the patience tax. When you split KPIs into segments, your overall average might get noisier or look worse. Executives who like green dashboards sometimes resist. Hold your ground with customer-centric measures. Say, “For complex cases, our SLA compliance rose from 58 percent to 92 percent. For simple cases, it stayed at 98 percent. The blended average shifted little, but customers felt the change.” Grounding the story in outcomes keeps the room focused.

Where the average still earns its keep

It is easy to swing too far the other way and distrust averages everywhere. They are still valuable in stable, unimodal systems, especially after you have confirmed that structure. The mean is a simple, sensitive signal for drift. If you have standardized one way of working and proved that two machines now behave the same, a single mean is fine. The discipline is to earn the right to average by testing first.

In many continuous processes, like a chemical reactor with tight feed control and uniform mixing, you expect unimodality barring upsets. Here, a mean on a control chart with thoughtful subgrouping and a good measurement system serves well. You do not need to slice ad infinitum to be thorough. You just need to verify homogeneity early and periodically.

The executive conversation

Leaders set the tone on how metrics are used. If you sponsor Six Sigma work, push your teams to show the shape of the data, not only the summary. Ask what the bimodal chart, if any, reveals. Request segment-level targets when customer groups or equipment classes differ. Reward teams that split a problem into two simpler problems and fix both, rather than those who push a blended KPI to green and leave two broken realities intact.

image

One sentence I use with executives: “Averages are great summaries of single truths. When we have two truths in the same number, the average becomes a mask.” That usually earns a nod and permission to redesign the dashboard.

Final thoughts from the shop floor

The day we stopped chasing the average on that packaging line, we freed two operators from a fight with a statistic. We built two standard work documents, tuned two sets of parameters, and aligned two training rituals. Within a week, the histogram lost its camel’s back silhouette and became a single, tight hill for each line. Defects fell in a way the old dashboard had never predicted because it had never told the truth. People stopped arguing about whose fault it was and started comparing notes across lines.

That is the quiet power of noticing a bimodal chart. It points you to where reality is already segmented and gives you license to respect that segmentation. Six Sigma has always been about seeing the process as it is, not as we wish it to be. When the myth of the average tempts you, put up the histogram and ask whose stories the humps represent. Then fix those stories. The mean will follow.