Stochastic mixed-signal VLSI architecture for highdimensional kernel machines
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
RomanGenovandGertCauwenberghs
DepartmentofElectricalandComputerEngineering
JohnsHopkinsUniversity,Baltimore,MD21218
roman,gert@jhu.edu
Abstract
Amixed-signalparadigmispresentedforhigh-resolutionparallelinner-
productcomputationinveryhighdimensions,suitableforefficientim-
plementationofkernelsinimageprocessing.Atthecoreoftheexternally
digitalarchitectureisahigh-density,low-poweranalogarrayperforming
binary-binarypartialmatrix-vectormultiplication.Fulldigitalresolution
ismaintainedevenwithlow-resolutionanalog-to-digitalconversion,ow-
ingtorandomstatisticsintheanalogsummationofbinaryproducts.A
randommodulationschemeproducesnear-Bernouillistatisticsevenfor
highlycorrelatedinputs.Theapproachisvalidatedwithrealimagedata,
andwithexperimentalresultsfromaCID/DRAManalogarrayprototype
in0.5mCMOS.
1Introduction
Analogcomputationalarrays[1,2,3,4]forneuralinformationprocessingofferverylarge
integrationdensityandthroughputasneededforreal-timetasksincomputervisionand
patternrecognition[5].Despitethesuccessofadaptivealgorithmsandarchitecturesinre-
ducingtheeffectofanalogcomponentmismatchandnoiseonsystemperformance[6,7],
theprecisionandrepeatabilityofanalogVLSIcomputationunderprocessandenviron-
mentalvariationsisinadequateforsomeapplications.Digitalimplementation[10]offers
absoluteprecisionlimitedonlybywordlength,butatthecostofsignificantlylargersilicon
areaandpowerdissipationcomparedwithdedicated,fine-grainparallelanalogimplemen-
tation,e.g.,[2,4].
Thepurposeofthispaperistwofold:topresentaninternallyanalog,externallydigitalar-
chitecturefordedicatedVLSIkernel-basedarrayprocessingthatoutperformspurelydigital
approacheswithafactor100-10,000inthroughput,densityandenergyefficiency;andto
provideaschemefordigitalresolutionenhancementthatexploitsBernouillirandomstatis-
ticsofbinaryvectors.Largestgainsinsystemprecisionareobtainedforhighinputdimen-
sions.Theframeworkallowstooperateatfulldigitalresolutionwithrelativelyimprecise
analoghardware,andwithminimalcostinimplementationcomplexitytorandomizethe
inputdata.
Thecomputationalcoreofinner-productbasedkerneloperationsinimageprocessingandpatternrecognitionisthatofvector-matrixmultiplication(VMM)inhighdimensions:
(1)
with-dimensionalinputvector,-dimensionaloutputvector,andmatrix
elements.Inartificialneuralnetworks,thematrixelementscorrespondto
weights,orsynapses,betweenneurons.Theelementsalsorepresenttemplates
inavectorquantizer[8],orsupportvectorsinasupportvectormachine[9].In
whatfollowsweconcentrateonVMMcomputationwhichdominatesinner-productbased1
kernelcomputationsforhighvectordimensions.
2TheKerneltron:AMassivelyParallelVLSIComputationalArray
2.1InternallyAnalog,ExternallyDigitalComputation
Theapproachcombinesthecomputationalefficiencyofanalogarrayprocessingwiththe
precisionofdigitalprocessingandtheconvenienceofaprogrammableandreconfigurable
digitalinterface.
Thedigitalrepresentationisembeddedintheanalogarrayarchitecture,withinputspre-
sentedinbit-serialfashion,andmatrixelementsstoredlocallyinbit-parallelform:
(2)
(3)
decomposing(1)into:
(4)
withbinary-binaryVMMpartials:
(5)
Thekeyistocomputeandaccumulatethebinary-binarypartialproducts(5)usinganana-
logVMMarray,andtocombinethequantizedresultsinthedigitaldomainaccordingto(4).
Digital-to-analogconversionattheinputinterfaceisinherentinthebit-serialimplementa-
tion,androw-parallelanalog-to-digitalconverters(ADCs)areusedattheoutputinterface
toquantize.A512128arrayprototypeusingCID/DRAMcellsisshownin
Figure1(a).
2.2CID/DRAMCellandArray
TheunitcellintheanalogarraycombinesaCIDcomputationalelement[12,13]witha
DRAMstorageelement.Thecellstoresonebitofamatrixelement,performs
aone-quadrantbinary-binarymultiplicationofandin(5),andaccumulates
(a)RS(i)
Vout(i)RSM1M2M3
0Vdd/2VddDRAMCID
Write
Compute0Vdd/2Vdd
0Vdd/2Vddx(j(j
(b)
Figure1:(a)MicrographoftheKerneltronprototype,containinganarrayof
CID/DRAMcells,andarow-parallelbankofflashADCs.Diesizeis
in0.5mCMOStechnology.(b)CIDcomputationalcellwithintegratedDRAMstorage.
Circuitdiagram,andchargetransferdiagramforactivewriteandcomputeoperations.
theresultacrosscellswithcommonandindices.Thecircuitdiagramandoperation
ofthecellaregiveninFigure1(b).Anarrayofcellsthusperforms(unsigned)binary
multiplication(5)ofmatrixandvectoryielding,forvaluesofin
parallelacrossthearray,andvaluesofinsequenceovertime.
ThecellcontainsthreeMOStransistorsconnectedinseriesasdepictedinFigure1(b).
TransistorsM1andM2compriseadynamicrandom-accessmemory(DRAM)cell,with
switchM1controlledbyRowSelectsignal.Whenactivated,thebinaryquantity
iswrittenintheformofcharge(eitheror0)storedunderthegateofM2.
TransistorsM2andM3inturncompriseachargeinjectiondevice(CID),whichbyvirtueof
chargeconservationmoveselectricchargebetweentwopotentialwellsinanon-destructive
manner[12,13,14].
ThechargeleftunderthegateofM2canonlyberedistributedbetweenthetwoCIDtran-
sistors,M2andM3.AnactivechargetransferfromM2toM3canonlyoccurifthereis
non-zerochargestored,andifthepotentialonthegateofM2dropsbelowthatofM3[12].
ThisconditionimpliesalogicalAND,i.e.,,unsignedbinarymultiplication,ofand
.Themultiply-and-accumulateoperationisthencompletedbycapacitivelysensing
theamountofchargetransferredontotheelectrodeofM3,theoutputsummingnode.To
thisend,thevoltageontheoutputline,leftfloatingafterbeingpre-chargedto,
isobserved.Whenthechargetransferisactive,thecellcontributesachangeinvoltagewhereisthetotalcapacitanceontheoutputlineacrosscells.
Thetotalresponseisthusproportionaltothenumberofactivelytransferringcells.After
deactivatingtheinput,thetransferredchargereturnstothestoragenodeM2.The
CIDcomputationisnon-destructiveandintrinsicallyreversible[12],andDRAMrefreshis
onlyrequiredtocounteractjunctionandsubthresholdleakage.
ThebottomdiagraminFigure1(b)depictsthechargetransfertimingdiagramforwrite