AltiVec register usage ( was Re: Merced not a consumer
Ronald C.F. Antony
rcfa at cubiculum.com
Tue Jul 14 01:01:44 PDT 1998
> Altivec is the next logical extension of CPU architecture. Early CPUs had
> only integer units. MMU's were separate support chips, but then moved into
> the main CPU. FPUs were separate support chips, but then moved into the
> main CPU. The original black NeXT hardware shipped with a Moto 56001 DSP,
> and now DSP-like functionality is moving into the main CPU.
Sure. On the other hand, the chip and instruction set becomes ever more
CISC-ish. Part of the RISC idea is to keep things small and clock the
hell out of chips. So in many cases, the question will boil down to what
the influence on the clock speed will be that more complex chip designs
have.
> Heck, Moto should do the same register mask trick with the FPU -- how often
> are you using all 32 FP registers?
That would certainly be a good idea.
> MMX, like most all Intel "advances", sacrifices elegance to the evil diety
> of backwards compatibility. In the end, it hurts you more than it helps.
I wasn't condoning their choice per se. Intel, for marketing reasons,
lives off compatibility. It helped them build the empire they have now
(even if in many cases the backwards compatibility was only conceptual
and not realistically useful, but it sold chips). So management came up
with the specs and the engineers did a pretty decent job of designing
MMX around these constraints. For all purposes, I like AltiVec better
(e.g. because it's vectorized and does fp computations, while MMX currently
is integer only) Hey, I like Mac hardware better, but choice is still good.
But all of AltiVec's merits don't rule out that a similar chip design w/o
AltiVec could be clocked considerably higher, be cheaper and thus could
get away with more CPUs at a higher clock speed for the same price.
In such a context, the advantages or disadvantages are probably very
application specific.
> >I just wonder then if we will need fat binaries for AltiVec and non-AltiVec
> >CPUs....
>
> Shouldn't be necessary. There's also a status bit defined that reports
> whether or not the CPU supports AlitVec instructions. That bit could be
> tested and shared libraries with/without the AltiVec instructions loaded, or
> a function pointer could be set to one of two values based on that register.
> Not particularly difficult.
Well, if you load different librarys, then you essentially use fat binaries.
It's a technologically superior solution to have one library/binary that's
MAB, i.e. has AltiVec and non-AltiVec versions, than having two distinct
versions of the library, and loading them conditionally. That's the whole
point of MABs. As long as you can't get away with identical code, you will
need MAB's or some equivalent solution (at least on a network server, where
you don't know what CPU a client has that will exectue the binary/framework/
library/plug-in)
> I think the unrecognized potential of the AltiVec unit is huge. With no
> setup penalty, it might be worth while to code string manipulation functions
> in altivec ( index() ro rindex(), for example ). Who knows? Some of the
> cache preload instructions could be useful for parsers. The mind boggles (
> well, my mind, anyways <
Surely true. But here again, you are working on data streams and blocks of
data. So MMX makes equal sense to be used, even if it may not provide the
same spectacular gains.
> But often the way you really want to code it is generate1, transform1,
> generate2, transform2, because you want to keep the flow of transformed
> coordinates constant ( like, say, to a VooDoo2 card ). With MMX, you have to
> code it as generate1, generate2, transform1, transform2 to minimize the mode
> switch penalty, but now the geometry pipe to the 3D card is stalled. Not
> only that, since the FPU and VPU are separate execution units, you can code
> it like this: generate1, transform1/generate2, transform2, where the second
> coordinate generation and first coordinate transformation are processed
> simultaneously. Even if the MMX registers were just as wide, the AltiVec
> would still be twice as fast. MMX sucks, AltiVec rules.
Would be interesting to know how much the penalty is to do something in
multiple "piped" threads on an SMP system. Then one CPU could be in MMX
mode and one in FP mode, and the data is piped from one thread to another,
each living on a different CPU which is in the proper mode.
Wonder if any current intel-based SMP-capable OS does keep track of the
mode the various CPUs are in, and allocates threads accordingly...
> As I said above, MAB isn't necessary. I imagine ( pure speculation ) that
> MacOS 8.6 and MacOS X PR1 will both support AltiVec instructions, and simply
> won't save the VPU mask register if altivec isn't enabled.
Hm, that would require to do a conditional each time you push things to
the stack. Unless there is some combined store-VPU-if-bit-set type of
instruction.
Is there anywhere some public AltiVec spec that goes into the details of
the enhanced instruction set and the backwards compatibility issues?
So far I have seen only more or less detailed conceptual overviews of the
architecture...
Ronald
==============================================================================
"The reasonable man adapts himself to the world; the unreasonable one persists
in trying to adapt the world to himself. Therefore all progress depends on the
unreasonable man." G.B. Shaw | rcfa at cubiculum.com | NeXT-mail welcome
More information about the MacOSX-talk
mailing list