Rails stores internationalization strings in a large YAML file and we often shadowed entries by using the same key twice, not seeing that another part of the application lost its translations.

    name: The name
  feature: The feature

This could be solved with a YAML linter, but there does not seem to exist very good tools. The other way I went is with a test, reading the YAML AST using Psych and checking that there are actually no duplicate keys. The object returned by Psych.parse is a Document that one can navigate like a tree, with children being either scalars (no duplicate), sequences (no duplicate), aliases (not used for me) or mappings (the one to check). Mappings have an even number of children: even numbered ones are keys and odd numbered ones are values.

The logic is thus to browse the tree. Since this runs as a test I made sure to return all duplicates at once (I don’t want to re-run it to discover yet another violation), as well as to return the path and line number on which the violation occurs (which makes it easy to find/fix the violation).

def find_sequence_duplicates(sequence, prefix)
  return sequence.children.each_with_index.flat_map do |child, index|
    find_node_duplicates(child, "#{prefix}[#{index}]")

def find_mapping_duplicates(mapping, prefix)
  all_keys, all_duplicates = mapping.children.each_slice(2).reduce([Set[], []]) do |(keys, duplicates), (key, value)|
    child_prefix = "#{prefix}/#{key.value}"
    current_duplicates = keys.include?(key.value) ? ["#{child_prefix} (line #{key.start_line + 1})"] : []
    current_keys = keys + [key.value]
    child_duplicates = find_node_duplicates(value, child_prefix)
    [current_keys, duplicates + current_duplicates + child_duplicates]
  return all_duplicates

def find_doc_duplicates(doc, prefix)
  return find_node_duplicates(doc.children[0], prefix)

def find_node_duplicates(node, prefix)
  if node.document?
    return find_doc_duplicates(node, prefix)
  elsif node.sequence?
    return find_sequence_duplicates(node, prefix)
  elsif node.mapping?
    return find_mapping_duplicates(node, prefix)
  elsif node.scalar?
    return []
    raise "Unhandled node at #{prefix}: #{node}"

def find_file_duplicates(file)
  return find_doc_duplicates(Psych.parse(File.read(file)), "")

Then in the spec it is about listing those files and checking them, printing all violations at once:

Dir.glob("config/locales/**/*.yml").each do |translation_file|
  describe translation_file do
    it "should have no duplicate" do
      duplicates = find_file_duplicates(translation_file)

      expect(duplicates).to be_empty

There could be other tests as well, for example to check that the same keys are defined in every language.