Psychoacoustics: The Science Of How We Hear

Psychoacoustics is the, at the most basic level, the manner in which our brains perceive sound. Various factors and variables affect the way in which we hear sounds including frequency, sound pressure level, subjective auditory variables and ear anatomy.

In addition to these variables, there are a number of phenomena which are a little more difficult to measure such as pitch, perceived volume, pleasantness of tone, and others.

Loudness, sharpness, tonality, and roughness are the ways to indicate how a patient perceives sound. These four factors measure a lone attribute of sound. However, people don’t just hear one of these factors, they hear them all simultaneously. By combining them together, it is possible to obtain a clear illustration of how a person hears audio. 

Basics of Hearing Anatomy


There are three sections of the ear and they include: the outer ear, the middle ear, and the inner ear. The outer ear consists of the pinna (auricle) and the ear canal. The pinna is the visible outer portion of the ear that is typically associated with the ear structure.

Additionally, the head and shoulders may be considered a part of the outer ear. The middle ear consists of the eardrum and 3 separate bones and the spaces in between them. Lastly, the inner ear consists of the cochlea. The cochlea is composed of four structures which are: the basilar membrane, the tectoral membrane, the inner hair cells, and the outer hair cells. 

Outer Ear

The most prominent function of the outer ear structures are the HRTFs. HRTF stands for “head related transfer functions.” HRTFs use the body, head, and auricle to give information to the brain about direction.

Another way of interpreting this information is through HRIRs. These are “head related impulse responses”, which conveys equivalent information as HRTFs but in the domain of time. Looking at the two inputs of audio through the ears, one would question how the brain would be able to discern direction with binaural hearing.

HRTFs allow us to differentiate between up and down, front and back, and other directions. Furthermore, the ear canal has a range of resonance somewhere between 1 and 4 kHz, which means that there is ear sensitivity at this range of audio. 

Middle Ear

The primary function of the middle ear is as a highpass filter for audio ranging around 700 Hz. In addition, the middle ear acts as a mechanical transformer. Muscle activity can affect the middle ear due to the alteration of the amount of space between the 3 bones in the middle ear. In some extreme cases, the middle ear can offer protection from deafening sounds. 

Inner Ear

Much of the sound that we hear is transduced into information for the brain (known as neural information) in the inner ear. The inner ear also behaves as a physical filter for audio. Sounds pass through the cochlea and become filtered from high to low as they go further into the cochlea.

Introduction to Psychoacoustics - Module 03A
Bones Of The Inner Ear

This occurs due to the coupled tuning of two highpass filters, joined with inner hair cells, which behave as detectors that differentiate between the two highpass filters. 

Inner Hair Cells

At the base of the inner hair cells, there are calcium channels which open and close as the hair cells move. At this scale, the inner hair cells are more comparable to villi than actual hairs on the head. These calcium channels are very sensitive and even the smallest movement is enough to activate an inner hair cell, also known as detectors. 

The inner hair cells detect the leading edge of the waveform at lower frequencies (500 Hz and below). The leading edge of the waveform is characterized by membranes moving closer to one another. At higher frequencies, however, the inner hair cells detect the leading edge of the envelope and this occurs at frequencies over 2000 Hz.

From Semantic

The phenomenon is more mixed at frequencies in between 500 Hz and 2000 Hz. As the level of motion on the basilar membrane increases, the more nerve cells activate the leading edge. 

At high frequencies of audio, the first section of the basilar membrane possesses all the detection ability above 15-16 kHz. Although certain individuals have displayed a remarkable hearing ability (20 kHz), the first section of the basilar membrane is known to be damaged by environmental damage. 

Audio Perception


Pitch, at its most fundamental level, is a scope of hearing which is subjective and varies from individual to individual. It is the quality of sound correlated to the frequency of a pure tone of a sound. High-frequency tones usually are classified as high pitch while low-frequency tones are classified as low pitch.

This relationship, between pitch and frequency, is not a straightforward or linear one, however. To further investigate the differences between the two, a unit called the mel has been assigned to deal with the concept of pitch. For instance, a 1000 Hz tone is equivalent to 1000 mel (given that the tone is at 40 dB SPL).


The concept of loudness has to do with SPL or sound pressure level. In addition, loudness depends on how long the sound is and its frequency content. SPL and loudness of a sound form a relationship and it can be approximated by a mathematical formula known as Steven’s power law which dictates SPL has an exponent of 0.67. 


Roughness and Localization

Localization is how a person is able to locate the listener’s ability to identify the location or origin of a detected sound in direction and distance. The mechanisms behind the perception of audio within the human body has been researched to a great extent in the past few decades.

Human sound perception utilizes various stimulants to locate where sounds originate. These include variations in time and intensity between the structures located in the inner ear, as mentioned above. 

Audio Discrimination

Audio discrimination is being able to identify and distinguish similarities and differences between distinct sounds. More specifically, this ability allows individuals to notice the minute differences between certain spoken words.

For example, the difference between pronouncing “sister” and “sitter” is small, but perceivable for those that are able to discriminate the small differences in sound waves. In a study that looked into the link between acoustic change complex (otherwise known as ACC) and similar measures of audio discrimination, it was found that “electrophysiological and psychophysical measures for frequency and intensity discrimination were significantly correlated.”

This indicates that the acoustic change complex may be used as a more complete index of audio discrimination for sound intensity and frequency. Moreover, the acoustic change complex amplitude may better serve as an indicator for sound computation than acoustic change complex latency. 

Relationship Between Perceptual Ability and Anatomy

Differences Between Audio Intensity and Loudness

While these two terms may appear interchangeable, there is a fair amount of difference between the two. Audio intensity relates to the actual sound pressure level that a sound registers on recording equipment.

There is a limit to the sounds that our ears can hear, in terms of the frequency of the sound. The range that we can hear is 20 to 20,000 Hertz (Hz). Additionally, sounds can vary based on how loud it is. This is measured in decibels (dB). Using these two metrics, it is possible to physically measure the actual audio intensity of a sound.

Compare this with audio loudness, however, and there is a different set of criteria which is used. The two metrics used to track audio loudness are audio sensation level and audio perception.

Clearly, the sound that enters through the outer ear is not the same volume as the sound that travels through the inner ear. In fact, the inner ear diminishes the audio that eventually continues to the eardrum to a fraction of what it originally was. The auditory nerve perceives this noise and relays the signal to the brain. 

Role of the Central Nervous System in Audio Perception

The central nervous system has an important role in deciphering the sounds that enter our ears. Most of the time, there is excess noise that must be filtered out. A normal conversation at a crowded restaurant for example, must utilize this ability to combat the influx of a lot of irrelevant audio.

As a matter of fact, the central nervous system has evolved to filter the important audio information from the environment and process it into a final output. 

Role of Audio Periphery in Psychoacoustics

The periphery informs the brain about direction through head related transfer function. Next up, the cochlea performs a frequency-time analysis. This analysis is converted into loudness through a form of sound wave compression.

The auditory periphery analyzes all signals in a time/frequency tiling called “ERB’s” or “Barks”. In addition, the mechanics of the cochlea allow for the initial sound waves to have a remarkable and disproportionate impact on what is actually heard. Now this may have a useful effect in the real world in terms of how humanity has evolved over its development. 

Partial Loudness

Partial loudness, known as short-term loudness, is usually processed along a brief period of time (less than or equal to 200 ms). The ability to perceive minute variations in loudness is primarily correlated with how long this period of time is.

For instance, the Level Roving Experiments indicate that when there are delays of over 200 ms between more than one distinct audio source, it is more difficult to recognize the variations in loudness. 

Auditory Objects 

Hearing involves the formation of auditory objects in our brains. Auditory objects are defined as “sequences of distinct sounds, or parts of continuous sounds.” These sounds are experienced together so that there is a continuous auditory experience.

Auditory objects can, at times, be converted into long-term memories. Of course, the extent of detail will not be saved entirely by the brain. In other words, there will be a significant decrease in the data processing rate of the auditory object. In addition, other factors such as cognition and attention at the time of processing will determine how well the audio is integrated into long term memory. 

The encoding of auditory objects in auditory cortex: Insights from  magnetoencephalography - ScienceDirect

Rates At Which Audio Is Processed

In between the period of sound reaching the ear to the brain perceiving how loud it is, the audio data travels at a rate of megabits per second, which is 1 million bytes per second. After loudness is determined, a feature analysis occurs, and the rate for this process is also in megabits per second.

However, as auditory object analysis happens, the data rate dips to only kilobytes per second, which is only a mere thousand bytes per second. To make matters worse, when auditory object data is processed into long-term memory, the data transfer rate is only in bytes per second.

Consequences Of Audio Data Transfer Rates

Due to the severe drop in data transfer rates for audio, there are some consequences for everyday hearing. One of these implications is that people are, at least partially, responsible for the audio that stays in their long-term memory. And this will occur whether it is a deliberate choice or not.

In addition, it is not only audio data that will be used to integrate auditory objects into memory. In fact, all sensory inputs will be used to incorporate the data for the brain. This appears to be a feature built into the human body by millions of years of evolution so it occurs unconsciously. 


Psychoacoustics is relevant to the field of audiology, primarily due to its implications for musicians and sound technology developers, among a plethora of other individuals in the field. Researching psychoacoustics will only reveal more information about the human body and its sense of hearing.

As mentioned previously, the different parts of the ear (outer, middle, and inner ear) contribute to the way sound is processed by the auditory nerve. The sounds we hear (known as auditory objects) are then converted into long-term memory within our brain.

The transfer rates at which the data is converted varies, which is why information is lost from the outer ear to the auditory nerve.

You may also be interested in:

About Post Author

Leave a Reply