How Outdoo AI Identifies Speakers

Learn how Outdoo AI identifies speakers on calls using diarization, voice fingerprinting, and machine learning to segment conversations and attribute speech to the right participants.

Outdoo AI analyzes calls to determine who spoke and when. This information is used to calculate stats and is displayed on the call page, helping you navigate and focus on relevant parts of the call.

Dividing Calls into Speaker Segments

The first step in speaker identification is segmenting the call into parts, each associated with a single speaker. Outdoo AI uses different approaches depending on the type of call.

1. Meeting Calls

Outdoo AI examines the participant list during the call to estimate who is present and when they speak. Because conferencing systems often introduce delays in speaker detection, Outdoo AI applies a proprietary algorithm to refine speaker identification, including voice fingerprinting to assign the right internal speakers.

2. Telephony Calls

Outdoo AI uses diarization to separate the single channel into multiple tracks based on voice differences. Machine learning then assigns tracks to the appropriate speaker.

Participant Identification Methods

1. Conference Calls

Participants joining via computer often identify themselves with their full name or a nickname. Those joining via dial-in may display a partial or full phone number.

Outdoo AI matches participant names or phone numbers to the meeting invite and learns nicknames and phone numbers over time for better accuracy.

If several participants use one device, they appear as a single speaker track, identified by the person logged into the web conference or dialed in.

2. Dialer Calls

Audio is merged into a single channel. Outdoo AI uses diarization to separate speakers, then leverages machine learning and voice fingerprinting to assign tracks to the right speakers.

Voice Fingerprint for Dialer Calls

Voice fingerprint uses samples of previous recordings to identify the Outdoo AI user. Here's how it works:

This feature must be enabled by the administrator and opted into by team members. Voice identification is only stored for subscribed users.

Introduction and 30-second audio samples are used to build a voice profile. Samples are refreshed over time to maintain accuracy.

Samples are used to identify users in real time without permanently storing any data. Past unidentified calls can be revisited and analyzed using updated voice profiles.

The system adapts to new environments, telephony systems, or equipment such as different headsets.

Key Notes on Speaker Identification

Outdoo AI doesn't limit the number of identified speakers, but identification depends on input from the conferencing or telephony system.

Participants who are silent or unidentified are excluded from detailed speaker tracks.

For maximum accuracy, voice fingerprinting is recommended for dialer calls.

Fixing Incorrect Speaker Identification in Call Recordings

Voice fingerprinting and privacy

Intro to the Call Page