Skip to Main Content

OpenRefine

An introduction to this free, open source tool for working with "messy" data.

Facets

 

Facet options group together identical cells across rows for a particular column, and indicates the number of rows within each group. Facets can be used to:

  • identify and resolve inconsistencies
  • conduct batch edits similar to find and replace
  • gather count data

Access Facet options by clicking on the down arrow next to each column title. Select "facet" > "text facet". This will create a box on the left-hand side of the screen, under the Facet/Filter sidebar that will allow you to browse through the text. By default, this box is sorted by name (alphabetical order). Keeping the box in alphabetical order allows you to catch any initial data-entry mistakes.

An overview of working with facets can be found here. 

Clusters

Clustering enables you to group together different cell values that may be alternative representations of the same thing, for example,  EST, e.s.t. and Eastern Standard time. Selecting "Cluster" in the top right corner of the facet pane will create a large pop-up box on your screen. By choosing different keying functions, OpenRefine automatically clusters your data based on syntax.

An overview of clustering in depth can be found here.