Most of these terms are fairly well understood by knowledgable audioheads. Others may take some exception, but here are my brief descriptions:
bright - Usually a mid/treble emphasis, with attendant lack of bass fill. Think metal tweeters.
dark - Recessed mid/treble response, comparatively prominent bass response. Think ported boxes.
delicate - Good microdetail reproduction, possibly at the loss of scale and macrodynamics. Think planars.
grainy - Classic digital. Think 1985 CDP playing your favorite 1983 "DDD" pop recording.
harsh - Could mean many things. Usually a poorly designed speaker is to blame and probably in the mid-to-treble range.
mellow - Probably rolled in the extremes. Upper bass to lower treble probably reproduced fairly accurately.
punchy - Good force in the midbass region. Old, sealed Advent boxes could properly be called punchy.
silky - This one's a little tough. It probably means the tweeter isn't making itself known, though emphasis/recession not necessary. Probably associated with good upstream electronics, i.e. not grainy or unbalanced. I can imagine a decent analog rig being described thus.
warm - Minor emphasis in the midbass region, probably without undue punchiness. Reproduction in this area is probably honest but without tremendous impact.
zippy - Referring to excellent transient speed and associated (probably) with delicacy. Light drivers tend to excel at speed and nuance, possibly at the expense of macrodynamics and ultimate power. Think Lowther or Fostex drivers.
The comment about "the lingo" is spot-on. These terms arise from necessity. Describing sound is exceedingly difficult. Having more-or-less commonly understood terms is crucial to communicating about audio, or nearly anything. These terms are a small part of the best efforts of the "audio collective" to relay what they perceive to others.
Speakers probably get the most credit or blame when these factors are out-of-balance, as it probably should be, but electronics play a role. And, a given piece might react differently in your system than mine. It can be maddening.
So, mock if you will. I think you'd spend your time better trying to learn what these terms mean by listening to different systems though. You might learn something.