Q format representation
Most of the developers who are working on the multimedia domain are not familiar with the below questionnaire😏.
What is q format?
Why do we require q formats in DSP processors?
Advantages of q format?
Disadvantages of q format?
Once you get to know about q format, then your daily working practice will be quiet easier to understand about code.
when a CPU lacks the floating-point unit in a processor then q format is used to enable the rational number processing with hardware arithmetic unit(ALU).
Floating-point operations take so much time and performance-wise also worse when compared with fixed-point. So most companies prefer fixed-point processors only.
We represent fixed-point numbers as Qn format,
Here n represents no.of fractional bits.
Other way of representation is Qm.n
m - no of Integer bits used to represent a decimal number.
n - no of fractional bits used to represents a fractional part
Q format bits can be any of 8,16,24 or 32. m+n gives the total no.of fractional bits & integer bits.
The maximum positive number that can be represented in for a Q31 format (32-bit variable) is 0x7FFFFFFF which corresponds to 0.9999999.
The Minimum negative number that can be represented in for a Q31 format is 0x80000000 which corresponds to -0.999999
So the range of Q31 format is [-0.999,0.999]
ex: Float to fixed conversion
5.76 , Here m = no.of integer bits used to represent the value 5 + one sign bit. So for my example can fit into Q4.28 format
5.76 * 2^28 = 1,546,188,226 (0x5C28F5C2)
Let's do some basic operations
1. Addition
a = 5.76 b = 0.76
We can add two floating-point numbers without any conversion, but when you need to add two fixed-point numbers then both should have to be in the same Q format.
a = 5.76 -> Q4.28
b = 0.76 -> Q1.31
a = 5.76 * (2 ^ 28) = 1,546,188,226 (0x5C28F5C2)
b = 0.76 * (2 ^ 31) = 1,632,087,572 (0x6147AE14)
Here a,b are of two different formats. So we need to bring two numbers to the same Q format.
a = Q4.28
b = (Q1.31 >> 3) = Q4.28
results = a + b
= 0x5C28F5C2+ (0x6147AE14 >> 3)
= 0x6851Eb84
Conversion of fix to float
result = 0x6851Eb84 / (2 ^ 28) = 6.51
Right shift will increase the no of integer bits & left shift will increase the no of fractional bits.
2. Multiplication
We can multiply two different numbers with different Q-formats.
References :
Accepts any suggestions...🙂🙂🙂🙂
Naveen Naik
Perfect article, and it helps me to understand Q format.
ReplyDeleteThanks a lot!
Thanks for your feedback
Delete