<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>LocalGovPL</title><revhistory><revision><revnumber>17</revnumber><date>2026-06-20 14:49:52</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>16</revnumber><date>2026-06-16 10:30:35</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>15</revnumber><date>2026-06-14 07:18:21</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>14</revnumber><date>2026-03-03 15:24:44</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>13</revnumber><date>2026-03-03 15:19:45</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>12</revnumber><date>2026-03-03 15:13:34</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>11</revnumber><date>2026-03-03 15:13:18</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>10</revnumber><date>2026-03-03 15:12:46</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>9</revnumber><date>2026-03-03 15:12:02</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>8</revnumber><date>2026-03-03 15:10:34</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>7</revnumber><date>2026-03-03 15:10:06</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>6</revnumber><date>2026-03-03 15:06:24</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>5</revnumber><date>2026-03-03 15:04:39</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>4</revnumber><date>2026-03-03 15:04:19</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>3</revnumber><date>2026-03-03 15:04:00</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>2</revnumber><date>2026-03-03 15:03:55</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision><revision><revnumber>1</revnumber><date>2026-03-03 15:03:32</date><authorinitials>MaciejOgrodniczuk</authorinitials></revision></revhistory></articleinfo><section><title>LocalGovPL (Korpus Debat Samorządowych)</title><para>LocalGovPL is a large-scale, speaker-annotated corpus of Polish local government meeting transcripts processed using an automatic two-stage LLM pipeline. The corpus consists of 31,899 sessions from 749 councils recorded between 2018 and 2025 (approximately 363M words). It is released in TEI P5 format with explicit links between utterances and registered participants. </para><para>The corpus covers various levels of local administration – municipalities (PL <emphasis>gminy</emphasis>), counties (PL <emphasis>powiaty</emphasis>), cities (PL <emphasis>miasta</emphasis>), and regional assemblies (PL <emphasis>sejmiki województw</emphasis>) – including both plenary sessions and committee meetings. </para><para>The primary goal of the resource is to facilitate research on the language of local governance, including studies of argumentation, interactional patterns, policy framing, and social dynamics within institutional dialogue. Beyond linguistic research, the corpus supports applications in speech-to-text alignment, automatic summarization, speaker role identification, and computational social science. </para><section><title>Data Sources</title><para>The raw transcripts were collected from two main publicly available sources: </para><orderedlist numeration="arabic"><listitem><para>Websites maintained by local administrative bodies – a set of specialized HTML extraction parsers was implemented to retrieve and normalise transcripts. </para></listitem><listitem><para><ulink url="https://esesja.tv/">eSesja.tv</ulink> – the meeting streaming platform used by local governments, from which transcription files in WebVTT format were downloaded. </para></listitem></orderedlist><para>The dataset covers meetings from November 2018 to June 2025 and includes several thousand hours of deliberation. Due to the decentralised publication practices of local institutions, the source transcripts exhibit substantial variability in format, structure, and language conventions. The preprocessing stage included normalisation of document encoding, removal of irrelevant metadata (e.g., agenda headers or timestamps), and segmentation into individual utterance candidates. </para></section><section><title>Processing Pipeline</title><para>The automatic structuring pipeline consists of two main stages, both powered by large language models (LLMs). </para><section><title>Stage 1: Speaker Extraction</title><para>Potential speaker names are identified using a combination of rule-based name recognition and contextual inference performed by LLMs. The models are prompted to detect person names and administrative roles, e.g., Chairperson (PL <emphasis>Przewodniczący</emphasis>), Mayor (PL <emphasis>Burmistrz</emphasis>), Councilor (PL <emphasis>Radny</emphasis>), ensuring both high recall and accurate disambiguation in cases of title repetition or partial name mentions. </para></section><section><title>Stage 2: Utterance Attribution</title><para>The LLMs are then used to assign each utterance segment to one of the previously extracted speakers. This stage requires interpreting discourse cues such as addressing forms, transitions, and speaker introductions. The output is a fully structured transcript in which each utterance is associated with a speaker identifier (speaker name, role, and meeting session). </para></section><section><title>Processing Configuration</title><para>For the public release, both stages were executed end-to-end with DeepSeek-chat-v3-0324. Long transcripts were processed with a chunking strategy (threshold &gt;1,500 lines, approximately 60,000 characters) and merged by global line numbers. </para></section><section><title>Throughput and Cost</title><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Metric</emphasis>                     </para></entry><entry colsep="1" rowsep="1"><para>    <emphasis role="strong">Value</emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Transcripts processed            </para></entry><entry colsep="1" rowsep="1"><para>         31,899 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Total input tokens               </para></entry><entry colsep="1" rowsep="1"><para> ~1,100,000,000 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Total output tokens              </para></entry><entry colsep="1" rowsep="1"><para>    ~55,000,000 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Total processing time (days)     </para></entry><entry colsep="1" rowsep="1"><para>          16.82 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Total cost (USD)                 </para></entry><entry colsep="1" rowsep="1"><para>         373.18 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Avg input tokens per transcript  </para></entry><entry colsep="1" rowsep="1"><para>       34,038.3 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Avg output tokens per transcript </para></entry><entry colsep="1" rowsep="1"><para>        1,742.3 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Avg generation time (s)          </para></entry><entry colsep="1" rowsep="1"><para>         41.964 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Avg cost per transcript (USD)    </para></entry><entry colsep="1" rowsep="1"><para>        0.01078 </para></entry></row></tbody></tgroup></informaltable></section></section><section><title>Corpus Statistics</title><para>The LocalGovPL corpus represents a substantial collection of local government meeting transcripts, spanning over seven years of administrative proceedings across 749 councils. </para><informaltable><tgroup cols="3"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Category</emphasis>                 </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Count</emphasis>        </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Average per Session</emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Basic Statistics</emphasis>         </para></entry><entry colsep="1" rowsep="1"/><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Total transcripts              </para></entry><entry colsep="1" rowsep="1"><para>       31,899       </para></entry><entry colsep="1" rowsep="1"><para>             –             </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Date range                     </para></entry><entry colsep="1" rowsep="1"><para> 2018-11 to 2025-06 </para></entry><entry colsep="1" rowsep="1"><para>             –             </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Number of councils             </para></entry><entry colsep="1" rowsep="1"><para>        749         </para></entry><entry colsep="1" rowsep="1"><para>             –             </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Transcripts per council        </para></entry><entry colsep="1" rowsep="1"><para>         –          </para></entry><entry colsep="1" rowsep="1"><para>           42.59           </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Duration Statistics</emphasis>      </para></entry><entry colsep="1" rowsep="1"/><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Average session duration       </para></entry><entry colsep="1" rowsep="1"><para>         –          </para></entry><entry colsep="1" rowsep="1"><para>        2.23 hours         </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Content Statistics</emphasis>       </para></entry><entry colsep="1" rowsep="1"/><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Total words                    </para></entry><entry colsep="1" rowsep="1"><para>    362,664,794     </para></entry><entry colsep="1" rowsep="1"><para>          11,369           </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Total characters               </para></entry><entry colsep="1" rowsep="1"><para>   2,468,439,776    </para></entry><entry colsep="1" rowsep="1"><para>          77,383           </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Speaker Statistics</emphasis>       </para></entry><entry colsep="1" rowsep="1"/><entry colsep="1" rowsep="1"/></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Average speakers per session   </para></entry><entry colsep="1" rowsep="1"><para>         –          </para></entry><entry colsep="1" rowsep="1"><para>           12.77           </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Average utterances per session </para></entry><entry colsep="1" rowsep="1"><para>         –          </para></entry><entry colsep="1" rowsep="1"><para>           80.2            </para></entry></row></tbody></tgroup></informaltable></section><section><title>Corpus Format</title><para>Corpus files are made available in <emphasis role="strong">XML TEI P5</emphasis> format, following the same design choices as the <ulink url="https://clip.ipipan.waw.pl/PPC">Polish Parliamentary Corpus (PPC)</ulink>, ensuring interoperability with existing tools and facilitating cross-corpus comparisons. Each meeting transcription is represented by a pair of XML files: </para><section><title>Session Header (header.xml)</title><para>The <code>header.xml</code> file contains the TEI header with document-level metadata and the participant registry, including: </para><itemizedlist><listitem><para><emphasis role="strong"><code>title</code></emphasis> – meeting title used as the document name (e.g., <emphasis>Sesja Rady 30 stycznia 2019</emphasis> / Council Session on January 30, 2019) </para></listitem><listitem><para><emphasis role="strong"><code>publisher</code></emphasis> – the organising body responsible for the session (e.g., <emphasis>Rada Miejska Nowego Miasta Lubawskiego</emphasis> / Municipal Council of Nowe Miasto Lubawskie) </para></listitem><listitem><para><emphasis role="strong"><code>system</code></emphasis> – source system label for provenance tracking (e.g., <emphasis>Sesja Rady Lokalnej</emphasis> / Local Council Session) </para></listitem><listitem><para><emphasis role="strong"><code>house</code></emphasis> – assembly or chamber type (e.g., <emphasis>Rada Powiatu</emphasis> / County Council) </para></listitem><listitem><para><emphasis role="strong"><code>sitting ID</code></emphasis> – numeric identifier of the sitting </para></listitem><listitem><para><emphasis role="strong"><code>type</code></emphasis> – content type of the source (e.g., <emphasis>Transkrypcja sesji</emphasis> / Session transcript) </para></listitem><listitem><para><emphasis role="strong"><code>total rows</code></emphasis> – number of input transcript rows prior to structuring </para></listitem><listitem><para><emphasis role="strong"><code>speaker count</code></emphasis> – number of distinct speakers recognised in the session </para></listitem><listitem><para><emphasis role="strong"><code>date</code></emphasis> – session date in ISO format (e.g., 2019-01-30) </para></listitem></itemizedlist><para>Each <emphasis role="strong"><code>person</code></emphasis> in the participant list is uniquely identified and carries a normalised name and role: </para><itemizedlist><listitem><para><code>person[@xml:id]</code> provides a stable identifier (e.g., <code>chairman_of_municipal_council</code>) </para></listitem><listitem><para><code>persName</code> holds the display name (e.g., <emphasis>Przewodniczący Rady Miejskiej</emphasis> / Chairman of the Municipal Council) </para></listitem><listitem><para><code>@role</code> encodes the role (e.g., <emphasis>Burmistrz Gminy</emphasis> / Mayor of the Municipality) </para></listitem></itemizedlist></section><section><title>Utterance Structure (text_structure.xml)</title><para>The <code>text_structure.xml</code> file contains the speech content segmented into <code>&lt;div&gt;</code>isions and <code>&lt;u&gt;</code>tterances. Each utterance carries: </para><itemizedlist><listitem><para><emphasis role="strong"><code>xml:id</code></emphasis> – a unique utterance identifier (e.g., <code>u-1.1</code>) </para></listitem><listitem><para><emphasis role="strong"><code>who</code></emphasis> – a pointer to the speaking participant using a TEI cross-reference to <code>header.xml</code> </para></listitem><listitem><para><emphasis role="strong"><code>start</code></emphasis> / <emphasis role="strong"><code>end</code></emphasis> – timestamps delimiting the utterance span in the source recording </para></listitem></itemizedlist><para>Documents may be wrapped in a <code>&lt;teiCorpus&gt;</code> element that includes <code>header.xml</code> via XML Inclusions (<code>xi:include</code>). The logical linkage between utterances (<code>&lt;u&gt;/@who</code>) and declared speakers (<code>&lt;listPerson&gt;/person[@xml:id]</code>) is maintained regardless of wrapping. </para></section></section><section><title>Evaluation</title><section><title>Test Dataset</title><para>A subset of <emphasis role="strong">30 transcripts</emphasis> from <emphasis role="strong">23 councils</emphasis> (spanning <emphasis role="strong">June 2022 to January 2025</emphasis>) was manually annotated to create a reference benchmark for evaluating both speaker identification and attribution. Each session lasts approximately <emphasis role="strong">2.36 hours</emphasis> and contains nearly <emphasis role="strong">13,682 words</emphasis>, with an average of <emphasis role="strong">17.27 speakers</emphasis> contributing about <emphasis role="strong">102.87 utterances</emphasis> per session. </para></section><section><title>Speaker Identification (Stage 1)</title><para>Macro-averaged precision, recall, and F1 over the 30-session benchmark (with relaxed identity equivalence): </para><informaltable><tgroup cols="4"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><colspec colname="col_3"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Model configuration</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Macro P</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Macro R</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Macro F1</emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Gemini-2.5-pro            </para></entry><entry colsep="1" rowsep="1"><para>     0.9058    </para></entry><entry colsep="1" rowsep="1"><para>     0.8814    </para></entry><entry colsep="1" rowsep="1"><para>      0.8786    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Gemini-2.5-flash          </para></entry><entry colsep="1" rowsep="1"><para>     0.9071    </para></entry><entry colsep="1" rowsep="1"><para>     0.8800    </para></entry><entry colsep="1" rowsep="1"><para>      0.8783    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> DeepSeek-chat-v3-0324    </para></entry><entry colsep="1" rowsep="1"><para>     0.8287    </para></entry><entry colsep="1" rowsep="1"><para>     0.8375    </para></entry><entry colsep="1" rowsep="1"><para>      0.8169    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> DeepSeek-r1-0528         </para></entry><entry colsep="1" rowsep="1"><para>     0.6281    </para></entry><entry colsep="1" rowsep="1"><para>     0.5887    </para></entry><entry colsep="1" rowsep="1"><para>      0.5904    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Llama-3.3-70b-instruct    </para></entry><entry colsep="1" rowsep="1"><para>     0.3537    </para></entry><entry colsep="1" rowsep="1"><para>     0.3673    </para></entry><entry colsep="1" rowsep="1"><para>      0.3491    </para></entry></row></tbody></tgroup></informaltable></section><section><title>Speaker Attribution (Stage 2)</title><para>Speaker-aware word error rate (sWER; lower is better) averaged across 30 sessions under three evaluation protocols: </para><informaltable><tgroup cols="4"><colspec colname="col_0"/><colspec colname="col_1"/><colspec colname="col_2"/><colspec colname="col_3"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Model configuration</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Abstract</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">GT participants</emphasis> </para></entry><entry colsep="1" rowsep="1"><para> <emphasis role="strong">Relaxed names</emphasis> </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Gemini-2.5-pro            </para></entry><entry colsep="1" rowsep="1"><para>      0.0393    </para></entry><entry colsep="1" rowsep="1"><para>             0.0460    </para></entry><entry colsep="1" rowsep="1"><para>           0.0592    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Gemini-2.5-flash          </para></entry><entry colsep="1" rowsep="1"><para>      0.0907    </para></entry><entry colsep="1" rowsep="1"><para>             0.1287    </para></entry><entry colsep="1" rowsep="1"><para>           0.1257    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> DeepSeek-chat-v3-0324    </para></entry><entry colsep="1" rowsep="1"><para>      0.2061    </para></entry><entry colsep="1" rowsep="1"><para>             0.2094    </para></entry><entry colsep="1" rowsep="1"><para>           0.2381    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> DeepSeek-r1-0528         </para></entry><entry colsep="1" rowsep="1"><para>      0.4582    </para></entry><entry colsep="1" rowsep="1"><para>             0.2498    </para></entry><entry colsep="1" rowsep="1"><para>           0.4684    </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Llama-3.3-70b-instruct    </para></entry><entry colsep="1" rowsep="1"><para>      0.6969    </para></entry><entry colsep="1" rowsep="1"><para>             0.7945    </para></entry><entry colsep="1" rowsep="1"><para>           0.7378    </para></entry></row></tbody></tgroup></informaltable><para>The three evaluation protocols are: </para><itemizedlist><listitem><para><emphasis role="strong">Abstract speaker attribution</emphasis> – speakers are treated as abstract entities (e.g., speaker-1, speaker-2); the Hungarian algorithm finds the optimal one-to-one mapping. This isolates the utterance attribution task from name recognition. </para></listitem><listitem><para><emphasis role="strong">Ground-truth participants</emphasis> – the system receives the gold-standard list of participants and only needs to determine which known speaker is talking at each point. </para></listitem><listitem><para><emphasis role="strong">End-to-end with relaxed name matching</emphasis> – both stages run without external assistance. A predicted speaker matches the reference if surnames match, titles/roles match, or the Levenshtein similarity between names is ≥ 0.8. </para></listitem></itemizedlist></section></section><section><title>Download</title><para><remark>* <ulink url="https://kdp.ipipan.waw.pl/static/localgov/locgovpl-tei.tar.gz">Full corpus (TEI XML)</ulink></remark> </para><itemizedlist><listitem><para><ulink url="https://kdp.ipipan.waw.pl/static/localgov/locgovpl-sample.tar.gz">Sample subset (30 annotated sessions)</ulink> </para></listitem></itemizedlist></section><section><title>Searching the Corpus</title><itemizedlist><listitem><para><ulink url="https://locgovpl.ipipan.waw.pl/">LocalGovPL Search Engine</ulink> (in Polish) </para></listitem></itemizedlist></section><section><title>Licence</title><para>All data used in this corpus originate from official public records published by governmental institutions. The collection and redistribution of these materials is conducted in compliance with the <emphasis role="strong">Polish Act of 11 August 2021 on Open Data and the Re-use of Public Sector Information</emphasis> (Dz.U. 2021 poz. 1641), which mandates the openness of public sector information for reuse. </para><para>The corpus does not include any personal data beyond names of public officials acting in their professional capacity. </para><para>The resource is intended for <emphasis role="strong">research and educational purposes</emphasis>, and all derivative uses must comply with applicable open-data regulations. </para><para><emphasis role="strong">Risk of Misattribution.</emphasis> As an automatically processed resource, the corpus may contain attribution errors (sWER ≈ 4–6%). Users should exercise caution when attributing specific controversial or sensitive statements to individual public officials based solely on this automated dataset. </para></section><section><title>See Also</title><itemizedlist><listitem><para><ulink url="https://clip.ipipan.waw.pl/PPC">Polish Parliamentary Corpus (PPC)</ulink> – the corpus of Polish parliamentary (Sejm and Senate) proceedings encoded in TEI P5 format, which served as the design model for LocalGovPL. </para></listitem><listitem><para><ulink url="https://www.clarin.eu/parlamint">ParlaMint</ulink> – a project providing speaker- and role-annotated parliamentary proceedings across many countries. </para></listitem><listitem><para><ulink url="https://councildataproject.org/">Council Data Project (CDP)</ulink> – an open infrastructure for collecting and curating municipal governance data. </para></listitem><listitem><para><ulink url="https://esesja.tv/">eSesja.tv</ulink> – the meeting streaming platform used by Polish local governments, one of the primary data sources for this corpus. </para></listitem></itemizedlist><section><title>Funding</title><para>The corpus was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00–00C002/19, the Polish Ministry of Education and Science grant 2022/WK/09, continued as part of the investment: CLARIN ERIC – European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure (period: 2024-2026) funded by the Polish Ministry of Science and Higher Education (Programme: ”Support for the participation of Polish scientific teams in international research infrastructure projects”), agreement number 2024/WK/01 and by CLARIN-PL, the European Regional Development Fund, FENG programme, agreement number FENG.02.04-IP.040004/24. </para></section></section><section><title>Licence</title><para><inlinemediaobject><imageobject><imagedata fileref="http://i.creativecommons.org/l/by/4.0/88x31.png"/></imageobject><textobject><phrase>http://i.creativecommons.org/l/by/4.0/88x31.png</phrase></textobject></inlinemediaobject> </para><para><ulink url="http://creativecommons.org/licenses/by/4.0/deed.en_US">Creative Commons Attribution 4.0 Unported License</ulink>  </para></section><section><title>Please cite</title><para>Czerski D., Ogrodniczuk M. (2026). <emphasis>LocalGovPL: A Corpus of Speaker-Attributed Polish Local Government Transcripts</emphasis>. Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026). Palma de Mallorca, 2026. European Language Resources Association (ELRA). </para></section></section></article>