Programmable Sound Generator

This is based on MikeJ's VHDL implementation of the AY38910 / YM2149, which copied the behaviour of the original chip as closely as possible. The processor interface is designed for the CP1610, and is rather unconventional. For instance, it is better suited for chips like the 8051 where A7-0 and D7-0 are multiplexed.

The 6502 and Z80 are not such chips, and it would be inconvenient to multiplex those signals with extra chips, so what they do is write the register address to one location and access the relevant register at another.

It is possible to modify the interface to be more compatible with 6502/Z80 systems so that the register addressing is done directly from the CPU address lines. This removes the need for the index register latch. Not a lot of logic saved, but it might make data register access faster.

Max clock is 2 MHz on the real AY38910 so I will be using 1.778 MHz (half NTSC colour carrier).

The real chip can mix its three analogue outputs simply by shorting the pins together. If they were driving current into a virtual ground, then each would be independent of the others. In practice, most machines do not go to such trouble and so the channels do affect each other. It is not simply a linear arithmetic summing of the DAC values.

MikeJ found this interaction was no simple matter. Some software uses this to combine three 4-bit DACs into one higher-resolution DAC for replaying 8-bit sound samples. The linear model simply did not work like the real chip, and sampled sounds did not sound good.

It would be nice to have a mathematical model of the mixer (and I am trying to deduce one) but in lieu of that the voltages were simply measured and MikeJ created a look-up table.
A problem is that the mixer has 12-bit address (4K) and 10-bit data out. That is 40 kbits (5 Kbytes), a significant portion of the target FPGA RAM blocks.

There may be room for optimisation because the ROM tables are symmetrical about their diagonals: the data at (a,b) is the same as (b,a). One could compare a and b, like so:

if (a<=b) lookup(a,b)
else lookup(b,a)

This means you can omit table entries for b>a. A square table becomes a triangle.

The symmetry applies for all of the 3 channels, so the optimisation can be applied for another pair, like so:

if (b<=c) lookup(b,c)
else lookup(c,b)

This means you can omit table entries for c>b.

Therefore we only need to store just over a quarter of the original 3-D lookup table, i.e. 10.5 kbits instead of 40 kbits.

A last idea might be to convert some of the tables into logic. The most signficant bits have the most regular data patterns, and therefore best suited. The least significant bits are the most detailed and perhaps best left as tables. Resource consumption is roughly like so:

table_mixer:
Number of Slices:                     309  out of   3072    10%  
Number of Slice Flip Flops:           252  out of   6144     4%  
Number of 4 input LUTs:               492  out of   6144     8%  
Number of BRAMs:                       10  out of     16    62%

Linear:
Number of Slices:                     334  out of   3072    10%  
Number of Slice Flip Flops:           244  out of   6144     3%  
Number of 4 input LUTs:               600  out of   6144     9%  
Number of BRAMs:                        1  out of     16     6%

Modified, also changed to fully clocked (not gated clock for simulating real chip):
Number of Slices:                     347  out of   3072    11%  
Number of Slice Flip Flops:           244  out of   6144     3%  
Number of 4 input LUTs:               611  out of   6144     9%  
Number of BRAMs:                        1  out of     16     6%

DAC values are updated at the audio clock (1.78 MHz) divided by 16 which gives 111.8607954 kHz. The serial DAC is able to cope with that.

Eventually I modified the logic so that it presented an array of 16 bytes to the CPU. I added a module that triggered the serial DAC when the value changed. This removes unnecessary serial traffic when the DAC value is constant.

The PSG is now producing plausible sounds when the registers are programmed. I do not have the time to write a good test program for it, but since I have not changed the sound generator sections I assume they work as well as they did for MikeJ.

This module consumes about 10% of the FPGA logic, pushing the whole design to 96% used, so I will remove it while developing other aspects. AFAIK the YM2149 / AY-3-8910 is still obtainable, so maybe by the time it is not then FPGA chips will have space for more logic. Meanwhile it is very little logic and a few pins to to drive the control signals of a real YM2149 chip.

BDIR	BC2	BC1	PSG FUNCTION
0	1	0	INACTIVE
0	1	1	READ FROM PSG
1	1	0	WRITE TO PSG
1	1	1	LATCH ADDRESS

From processor {	------> +5V --- ------>	BDIR BC2 BC1
		PSG

BDIR <= not cpu_rnw -- easy!
BC1  <= cpu_a(0) xor cpu_rnw -- xor allows data reg to appear at the same r/w location
-- could do the decoding with dual 4 to 1 multiplexer?