What is Audio Editing?

by Alex Cope / Thursday, 20 February 2025 / Published in Articles

Content

Introduction
Technical elements
- Background and Electrical Noise
- Room Sound
- Clipping distortion
Performance-related elements
- Unwanted Artifacts of the Speech Apparatus
- Rhythmic Inaccuracies
- Pitch Correction
Conclusion

Introduction

Editing audio tracks is an important preparatory step before the actual mixing of a track. A well-executed recording of audio material, combined with thorough editing, forms the foundation of any mix. No matter how great the mixing and mastering of your song is, rhythmic and pitch inaccuracies, extraneous noises and artifacts of all kinds can completely ruin the experience. If you want your song to truly shine, you can CONTACT US for an audio editing where we will thoroughly investigate any inaccuracies.

Let’s take a closer look at what elements are subject to editing at this stage.

Broadly speaking, elements requiring correction can be divided into two groups: technical and performance-related. Technical issues include background noise (e.g., the hum of an air conditioner), electrical noise (caused by usage of low-quality cables), room sound (which, for simplicity, can be described as echo when recording in an acoustically untreated room), clipping distortion (resulting from incorrect input gain settings), as well as in vocal recordings—harsh sibilants, plosive sounds, and natural noises produced by the vocal apparatus (such as lip smacks and breathing sounds). One might assume that the latter examples belong to the category of performance-related elements, but this would be incorrect, as these issues arise due to the way recording equipment interprets the signal rather than the way the performer produces the sound. Performance-related elements include rhythmic inaccuracies (deviations from the rhythmic structure of the piece) and pitch inaccuracies (commonly referred to as intonation issues). These lists are not exhaustive but include the most common problems that require correction.

Let’s examine each of these elements in more detailed way and see how sound engineers handle them.

Technical elements

Background and Electrical Noise

These two elements are grouped together because they often present a similar problem conceptually, even though they occur in different parts of the frequency spectrum. For instance, air conditioner noise typically manifests as a low-frequency hum, whereas electrical noise often takes the form of a hum in the upper midrange of the frequency spectrum.

In the first case, when dealing with low-frequency noise, the problem is often resolved in a fairly straightforward manner—by cutting out the part of the spectrum where the noise is present. This approach is effective because most musical instruments and vocal timbres contain little to no useful information in the frequency range occupied by such hums, so removing it usually does not significantly degrade the quality of the recorded material. However, it is important to remember that not all instruments and voices exist outside the frequency range affected by air conditioners or passing cars, and not all low-frequency noise is purely low-frequency in nature. When such overlaps occur—similar to those found in electrical noise—conventional equalization methods may no longer suffice, and spectral processing is required to more precisely target the unwanted frequencies.

That being said, compromises are inevitable in such cases. This leads to the dilemma: how much of the useful sound are we willing to sacrifice in order to remove the noise? There is no definitive answer, as each situation must be addressed individually. However, the best solution in such scenarios is to prevent the problem at the recording stage rather than fix it afterward.

Room Sound

This is a common issue in amateur and semi-professional recording environments. Room sound itself is not inherently bad—major studios invest heavily in hiring acoustic engineers to design spaces with a specific sound character. However, if an artist or band does not have access to an expensive studio or cannot afford to build a properly treated recording space, the recorded material will inevitably suffer from poor room acoustics to some extent.

To understand the problem better, let’s briefly touch on the nature of sound. Sound consists of waves produced by vibrating elastic bodies (e.g., strings, vocal cords, drumheads, etc.). These waves carry specific frequency information, depending on their source, and the human ear can perceive frequencies roughly between 20 Hz and 20 kHz. Additionally, these waves tend to reflect off objects they encounter, which is where problems begin to arise.

The issue is that sound waves of different frequencies reflect differently depending on the density of the objects they encounter. Why is this a problem? Imagine someone recording vocals in a regular, untreated room—say, a bedroom, which is a common scenario in today’s music world. While the microphone primarily captures the direct sound from the vocalist’s mouth, it also picks up sound waves that have reflected off the walls and furniture. This results in a twofold issue: first, these reflections reach the microphone with a slight delay, and second, as mentioned earlier, they arrive with an altered frequency profile. This altered frequency profile is the main challenge we need to address.

Unfortunately, eliminating room sound from the track without degrading the quality of the desired audio is impossible, and the more prominent the room sound is in the recording, the more compromises must be made to remove it completely. As with background noise, the solution lies in balancing the reduction of unwanted room sound while preserving as much of the useful audio as possible.

In practice, the primary tool used to address this issue is iZotope’s De-Reverb. This software utilizes unique algorithms to spectrally separate the desired sound from the unwanted room reflections, helping to mitigate the problem effectively.

Clipping distortion

This crackling is a sound artifact caused by an incorrectly set input gain level during recording. Under normal conditions, sound is represented as a wave following a sine curve. Imagine that at its peak, the wave exceeds the limits of the recording equipment’s capabilities, resulting in a clipping of the wave at that point. This disrupts its natural motion, producing the unwanted crackling.

In the past, the only solution to this problem was re-recording, and even today, this remains the most preferred method. However, if re-recording is not an option, the take was otherwise successful, and the waveform exceeded the threshold only slightly, modern technology allows for the “reconstruction” of the wave’s natural flow to eliminate the unwanted distortion. Nonetheless, this approach should only be used in minor cases of recording errors, as significant and persistent crackling requires a complete re-recording of the material.

Performance-related elements

Unwanted Artifacts of the Speech Apparatus

This group of issues is perhaps the most common and, therefore, the most well-studied in terms of solutions. As previously mentioned, it includes harsh sibilant sounds, plosive sounds from letters like “p” and “b,” as well as so-called “mouth clicks”. These sounds are not inherently problematic or a flaw of the performer, so why do they need correction in recordings?

In live vocal performances without amplification (microphones), we do not encounter these issues simply because we are at a sufficient distance from the sound source (the vocalist), allowing the energy of sounds like sibilants to dissipate. Even at arm’s length from the performer, the sound does not reach our ears directly or in full intensity. This is the key difference in studio recording.

During vocal or spoken-word recordings, the performer is positioned very close to the capsule—the part of the microphone that captures the sound—and is directly facing it. Even under similar conditions, if we replaced the microphone with a human ear, the latter would have an advantage due to the shape of the outer ear, which softens incoming sound through multiple reflections before reaching the eardrum, thereby reducing its intensity.

There are special tools and recording techniques designed to minimize the negative effects of these sounds. Pop filters help disperse the energy of plosive sounds and partially soften sibilants, while angling the microphone relative to the vocalist reduces the impact of problematic sounds on the capsule. However, these measures do not always fully eliminate the issue, as performers may move during recording, reducing the effectiveness of the precautions.

To address these problems, precise dynamic frequency processing is necessary. This can be achieved using either a dynamic equalizer or a multiband compressor, as these issues are essentially dynamic spikes within a narrow frequency range—higher mid frequencies for sibilants and lower mid frequencies for plosive sounds. The key point here is the emphasis on dynamic processing: we do not need to permanently cut out frequencies containing problematic artifacts but rather suppress them only at the moments when they interfere.

As for so-called “mouth clicks,” we notice them in recordings for the same reason that sibilants become problematic—proximity to the microphone capsule. However, the approach to dealing with them differs. Since these artifacts are not valuable sound information that has simply been exaggerated by the microphone, but rather unwanted noise, they need to be completely removed. Traditional methods such as equalization are ineffective in this case, so spectral processing is used instead. The necessary tools for this task are provided by the RX software from iZotope.

Rhythmic Inaccuracies

Now it’s time to move on to performance-related elements that require correction, starting with rhythm adjustment. The problem of inconsistent timing is most apparent in drum parts, which define the rhythmic foundation of a track and set the pulse for other musicians.

For ease of explanation, let’s consider a piece with a fixed meter recorded to a metronome, where strict adherence to the tempo is a creative requirement. Despite the metronome’s guidance, a drummer (in this case) may still deviate from the tempo in various ways. They might be on their hundredth take and making mistakes due to fatigue, they might not be the most experienced musician, or they could have followed an artistic impulse to slightly speed up or slow down certain sections. Regardless of the cause, if the final recording deviates from the intended artistic vision, it requires correction.

Conceptually, solving this problem is straightforward, but in practice, it is a tedious and time-consuming task. Essentially, the process involves listening to the part, identifying areas that need adjustment, and then manually aligning the misplaced notes with the track’s tempo grid.

There are software tools that simplify this process by detecting rhythmic divisions in the performance and even automatically aligning them to the desired tempo. However, automatic corrections are rarely sufficient, and the sound engineer must manually review and adjust any remaining inaccuracies. The more rhythmically complex the performance, the worse automatic correction tends to perform, leaving more manual adjustments for the engineer.

In the example above, we assumed that strict adherence to the metronome was required. However, even in other cases—such as when a band records without a metronome and plays together live—rhythmic correction may still be necessary. While there is no click track to align to, musicians must still stay in sync with one another. Depending on the genre, some degree of looseness in timing may be acceptable, but in most modern music, we aim for either precise metronomic timing or tight alignment between musicians. Therefore, rhythmic correction is almost always a necessary step in the editing process.

Pitch Correction

This section is conceptually quite similar to the previous one, but pitch correction presents its own unique challenges, making it a separate category. The most logical way to discuss this issue is through the example of a vocal track, as vocals typically require the most pitch correction in a recording. However, within certain limits, this type of correction can also be applied to tonal instrumental parts.

The core problem that pitch correction addresses is straightforward—singers, to varying degrees, sing off-key. For simplicity, we will exclude performance styles where intentional pitch inaccuracies are used artistically and do not require correction. Instead, we will focus on the most common scenario, where the vocalist must hit every intended note precisely.

Just like with rhythmic correction, the sound engineer’s task is to listen to the track, identify off-pitch sections, and adjust them to match the original intent. However, this process is significantly more complex than rhythmic correction. When adjusting timing, we manipulate the temporal aspect of the sound, which, as discussed in the previous section, generally does not introduce unwanted artifacts or alter the timbre. The same cannot be said for pitch correction.

Every note—whether sung by a vocalist, played on a guitar, or produced by any tonal instrument—has its own unique timbral color. When we alter a note’s pitch after it has been recorded, its timbre remains unchanged, which can result in an unnatural sound, a topic we will discuss shortly. Therefore, if a recording aims for a natural-sounding vocal performance, only minor pitch inaccuracies should be present. Experience shows that pitch errors of up to half a semitone—and in rare cases, a full semitone—can be corrected without significant issues. Any pitch inaccuracies beyond these limits will introduce noticeable digital artifacts. These artifacts arise not only from the mismatch between the sound’s original timbral color and its altered pitch but also from unnatural transitions between unedited and edited notes.

Examples of these artifacts can be found in many modern artists who deliberately use excessive pitch correction as a stylistic choice, making the resulting artifacts a defining feature of their sound. If an artist aims to fit into this style, they may intentionally sing notes that exceed the “safe” correction limits, as in this case, the correction artifacts are desirable.

However, if the goal is to achieve a naturally sounding vocal, the recording process must be approached with care, allowing the sound engineer to refine the performance to perfection.

Conclusion

This concludes our review of the most common issues that require correction. We hope this article has helped you better prepare for recording and understand the sound engineer’s role in the pre-mixing stage and audio editing.

You must be logged in to post a comment.