The first track is the normal mixing, the second is the one with volumes specified the third is the original track. I use this example, where I concat the same audio file in different positions: ffmpeg -vn -i test.mp3 -i test.mp3 -i test.mp3 -filter_complex "adelay=0|0,volume=3 adelay=2000|2000,volume=2 adelay=4000|4000,volume=1 amix=inputs=3:dropout_transition=0" -q:a 1 -acodec libmp3lame -y amix-volume.mp3 The solution I’ve found is to specify the volume for each track in a “descendant” order and use no normalization filter afterwards. Please suggest how I can use amix to mix many inputs and ensure constant volume level. That could be fixed by applying silence at the beginning and end of each clip, then they will have same duration and volume will be at the same level. You can see on image that volume is increased linearly withing a time. First mixed stream resulted in lowest volume, and last one is highest. In my case I’m using files with different duration. In that case volume is dropped in constant value and could be fixed with ",volume=2". It works fine if input files have equal duration. I noticed that ffmpeg amix filter doesn’t output good result in specific situation.