A question that felt personal
I'm an international student, and when I arrived in Halifax I had no idea what the city was really like. I'd heard rumours that Halifax was a human trafficking hotspot, and I wanted to find out if there was any data to back that up.
When I went looking, I couldn't find human trafficking data — but I found something else: the HRM Open Data Hub publishes a rolling 7-day dataset of general occurrence crimes. The catch? No historical archive. Only the last 7 days are ever available.
So I built one myself. I manually downloaded the CSV every other day starting January 14, 2026, and loaded each file through my SSIS pipeline into a SQL Server data warehouse. What started as a personal safety question turned into a full data engineering and analytics project.
Five categories, one dominant
Assault comprises more than half of all reported incidents — and it appears in every geographic cluster, not just downtown.
A sustained climb, not a spike
Crime grew every single month without exception. The rate of growth is slowing, but volume is still rising — and March brought the highest severity rate yet.
January
BaselineAssault already at 50% of total. Gloria McCluskey Ave and Gottingen St established as hotspots immediately. High severity at 6.6%.
February
+40.9%Sharpest jump in dataset. High severity actually dropped to 6.2% — more crimes, but proportionally fewer serious ones. Spring Garden Rd entered top 5.
March
+30.5%Most concerning: highest volume and highest severity (8.3%) simultaneously. Mumford Rd surged from 2 → 17 incidents.
How I built it
A full-stack pipeline from manual CSV collection through to interactive Power BI reporting, with unsupervised ML in between.
Data Engineering
SSIS Foreach Loop pipeline with staging, incremental dimension loading, fact table, and pre-aggregated reporting tables. Error rows redirected, not dropped.
Machine Learning
K-Means clustering on coordinates only (k=4). Optimal k found via elbow method + KneeLocator, validated with silhouette score. Geography alone predicted crime type.
Business Intelligence
5-page Power BI report with DAX measures for KPIs, MoM comparison, 7-day moving average, severity percentages, and Q2 forecast with confidence intervals.
Data Warehouse
Star schema with IncidentID grain. Staging → Dimensions → Fact → Reporting layers. Designed for rerunning without duplicates and future event type expansion.
Six-step data flow
From CSV files to Power BI — every step designed to be re-runnable, fault-tolerant, and transparent about errors.
01 — Enumerate Crime Data
Foreach Loop container iterates over all CSVs in source folder. File path dynamically bound to package variable — drop any number of new files and run once.
02 — Staging
Flat File Source → OLE DB Destination into stg.HalifaxCrime. Error rows redirected (not failed) — pipeline keeps running and logs bad records for review.
03 — Dimension Loading
Three dimensions loaded incrementally: dim.Crime (type, category, severity), dim.Event (reporting type), dim.Location (street address). NOT IN subquery prevents duplicates.
04 — Fact Table
fact.CrimeIncidents — one row per incident with foreign keys to all dimensions plus date key, coordinates (x, y), and source record identifier.
05 — Reporting Tables
Pre-aggregated views: rpt.ByCategory, rpt.ByLocation, rpt.BySeverity. Lighter query layer for Power BI rather than hitting the fact table for every summary visual.
06 — Calendar Table (DAX)
Generated in Power BI — spans ~6 years with year, month, week number, and weekday/weekend flags for time intelligence measures and forecasting.
Four zones, distinct crime profiles
K-Means was fed only x/y coordinates — no crime type data at all. Yet the clusters came back criminologically coherent. Geography alone predicts crime type in HRM.
| Cluster | Assault | Break & Enter | Robbery | Theft from Vehicle | Theft of Vehicle | Total |
|---|---|---|---|---|---|---|
| Urban Core | 406 | 75 | 47 | 189 | 69 | 786 |
| Suburban | 107 | 16 | 2 | 17 | 6 | 148 |
| Rural North | 13 | 3 | 0 | 2 | 3 | 21 |
| Remote East | 6 | 2 | 0 | 0 | 4 | 12 |
What the data reveals
Ten findings from 68 days of manually collected crime data across Halifax Regional Municipality.
Assault dominates everywhere
54.5% of all incidents and present in every geographic cluster — not a downtown-only problem. It follows people regardless of location.
Unbroken monthly growth
242 → 341 → 445 across Jan–Mar. The rate of growth is slowing, but volume is still climbing every single month.
March raised both alarms
Highest volume and highest severity (8.3%) simultaneously. That's the combination worth watching most closely.
72% happen on weekdays
Property crimes get discovered and reported during business hours. The distribution reflects reporting behaviour as much as occurrence.
Urban Core concentrates crime
72% of assaults, 85% of theft from vehicle, and 96% of robberies all happen in the downtown peninsula.
Robbery is urban-exclusive
Zero robbery incidents in Rural North or Remote East. It needs victim proximity that rural areas don't provide.
Mumford Road surged
2 incidents in February → 17 in March. The biggest single location shift in the dataset — possible geographic displacement.
Top 5 are corridors, not neighbourhoods
All hotspot locations are busy urban thoroughfares: Gloria McCluskey Ave, Gottingen St, Mumford Rd, Spring Garden Rd, Barrington St.
2–3 week cyclical pattern
The 7-day moving average shows rhythmic peaks and troughs, with rising floors — even quiet weeks are getting busier.
Geography predicts crime type
K-Means used only coordinates — no crime type input — yet produced criminologically distinct clusters. Geography is a strong natural predictor.
What's ahead
Built entirely in Power BI using DAX. The forecast extends from observed data (~397–434 in late March/early April) through June 2026. The March baseline projection — which I trust most — suggests a possible plateau in the 400–480 crimes/month range through Q2.
The confidence interval widens deliberately as it projects further out. With only 68 days of training data, showing a narrow cone would be dishonest. The lower bound never drops below March's observed floor, meaning the model treats March's activity level as the new baseline minimum.
Data-informed suggestions
Not policy directives — these come directly from the numbers, intended for different audiences who might engage with this project.
Be informed, not afraid
The five highest-incident locations are busy urban corridors. Assault is everywhere at 54% — awareness matters in every part of the city. 96% of robberies are Urban Core, and weekdays see 72% of reported crimes.
Watch the suburban shift
Property crime is creeping into suburbs — Bedford, Clayton Park, suburban Dartmouth. Mumford Road's 2→17 surge is the sharpest location-level change in the dataset. Even rural clusters have assault incidents.
Urban Core, weekday-weighted
The Urban Core is the highest-leverage target. Data supports weekday-weighted deployment. The 2–3 week cycle suggests behavioural rhythms worth investigating. March is a warning sign for resource calibration.
Publish historical data
The 7-day rolling window is a gap in HRM's own analytical capacity. Extending to 90 days would enable evidence-based planning at minimal cost. Open data enables community-led analysis like this project.
Limitations & known gaps
Every gap documented. Every dip in the moving average checked against this list before drawing conclusions.
Download & explore
Start with the PDF for a quick visual overview, then open the .pbix in Power BI Desktop to explore interactive slicers and tooltips.
Crime_Analytics.pbix
Interactive Power BI report — 5 pages with cross-filtering slicers
.pdfCrime_Analytics.pdf
Static export of all dashboard pages — quick visual overview
.dtsxHalifax_Crime.dtsx
SSIS ETL package — full pipeline in Visual Studio format
.ipynbHRM_Clustering.ipynb
K-Means clustering notebook — elbow, silhouette, cluster export